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Preface 


Originally, functional analysis was the study of functions. It is now considered to 
be a unifying subject that generalizes much of linear algebra and real/complex 
analysis, with emphasis on infinite dimensional spaces. This book introduces this 
vast topic from these elementary preliminaries and develops both the abstract 
theory and its applications in three parts: (I) Metric Spaces, (II) Banach and Hilbert 
Spaces, and (III) Banach Algebras. 

Especially with the digital revolution at the turn of the millennium, Hilbert 
spaces and least squares approximation have become necessary and fundamental 
topics for a mathematical education, not only just for mathematicians, but also for 
engineers, physicists, and statisticians interested in signal processing, data analy- 
sis, regression, quantum mechanics, etc. Banach spaces, in particular L 1 and L' J 
methods, have gained popularity in applications and are complementing or even 
supplanting the classical least squares approach to many optimization problems. 


Aim of this Book 

The main aim of this book is to provide the reader with an introductory textbook 
that starts from elementary linear algebra and real analysis and develops the theory 
sufficiently to understand how various applications, including least squares 
approximation, etc., are all part of a single framework. A textbook must try to 
achieve a balance between rigor and understanding: not being too elementary by 
omitting ‘hard’ proofs, but neither too advanced by using too strict a language for 
the average reader and treating theorems as mere stepping stones to yet other 
theorems. Despite the multitude of books in this area, there is still a perceived gap 
in learning difficulty between undergraduate and graduate textbooks. This book 
aims to be in the middle: it covers much material and has many exercises of 
varying difficulty, yet the emphasis is for the student to remember the theory 
clearly using intuitive language. For example, real analysis is redeveloped from 
the broader picture of metric spaces (including a construction of the real number 
space), rather than through the even more abstract topological spaces. 
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Audience 

This book is meant for the undergraduate who is interested in mathematical 
analysis and its applications, or the research engineer/statistician who would like a 
more rigorous approach to fundamental mathematical concepts and techniques. It 
can also serve as a reference or for self-study of a subject that occupies a central 
place in modern mathematics, opening up many avenues for further study. 

The basic requirements are mainly the introductory topics of mathematics: Set 
and Logic notation, Vector Spaces, and Real Analysis (calculus). Apart from these, 
it would be helpful, but not necessary, to have taken elementary courses in Fourier 
Series, Lebesgue Integration, and Complex Analysis. Reviews of Vector Spaces 
and Measurable sets are included in this book, while the other two mentioned 
subjects are developed only to the extent needed. 

Examples are included from many areas of mathematics that touch upon 
functional analysis. It would be helpful at the appropriate places, for the reader to 
have encountered these other subjects, but this is not essential. The aim is to make 
connections and describe them from the viewpoint of functional analysis. With the 
modem facilities of searching over the Internet, anyone interested in following up 
a specific topic can easily do so. 

The sections follow each other in a linear fashion, with the three parts fitting 
into three one-semester courses, although Part II is twice as long as the others. The 
following sections may be omitted without much effect on subsequent topics: 

Section 6.4 C(X, Y) 

Section 9.2 Function Spaces 

Sections 11.5 Pointwise and Weak Convergence 

Sections 12.1 and 12.2 Differentiation and Integration 

Sections 14.4 and 14.5 Functional Calculus and the Gelfand Transform 

Section 15.4 Representation Theorems. 
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Chapter 1 

Introduction 


Much of modern mathematics depends upon extending the finite to the infinite. 
In this regard, imagine extending the geometric vectors that we are familiar with to 
an infinite number of components. That is, consider 


V = a [Cl + C12&2 + • • • = (fll, 02, « 3 , ■ • •) 

where e, are unit independent vectors just like i, j and k in Cartesian geometry. It is 
not at all clear that we can do so — for starters, what do those three dots “■ ■ • ” on 
the right-hand side mean? Surely they signify that as more terms are taken one gets 
better approximations of v. This immediately suggests that not every such “infinite” 
vector is allowed; for example, it might be objected that the vector 


v = e\ + e 2 + e 3 H 


cannot be approximated by a finite number of these unit vectors, as the remainder 
eN + ■ ■ ■ looks as large as v. Instead we might allow the infinite vector 

1 1 

V = <?i + -e 2 + -e 3 H 

although even here, it is unclear whether this may also grow large, just as 


To continue with our experiment, let us just say that the coefficients become zero 
rapidly enough. 

There are all sorts of things we can attempt to do with these “infinite” vectors, by 
analogy with the usual vectors: addition of vectors and multiplication by a number 
are easily accomplished, 

(1, i, i, . ..) + (!, I, I,...) = (2,1, 
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2 x (0, 1, ...) = (0,2, -1, ...). 
One can even generalize the “dot product” 


(at, a 2 , ■ - ■ (b\,b 2 , ■■■) = aibi + 02^2 H , 


assuming the series converges — and we have no guarantee that it always does. 
For example, if jc is equal to (1, l/\/2, 1/V3, . . .), then x ■ x = X/j^=i is 
infinite. Again let us remedy this situation by insisting that vectors have coefficients 
that decrease to 0 fast enough. 

Having done this, we may go on to see what infinite matrices would look like. 
They would take an infinite vector and return another infinite one, as follows, 


/ a\\ «i2 - 

■A 


(*i\ 

! y A 

«21 «22 



X2 

= I y 2 

V : 



\ ■■ ) 

u 


where y\ = a nxi +a 12^2 + • • • = a inX n , etc. Perhaps we may need to have the 
rows of the matrix vanish sufficiently rapidly as we go down and to the right of the 
matrix. 

Once again, many familiar ideas from finite matrices seem to generalize to this 
infinite setting. Not only is it possible to add and multiply these matrices with- 
out any inherent difficulty, but methods such as Gaussian elimination can also be 
applied in principle. There seems to be no intrinsic problem to working with infinite- 
dimensional linear algebra. 

It may come as a slight surprise to the reader that in fact he/she has already met 
these infinite vectors before! When a function is expanded as a MacLaurin series 

/(*) = /( 0) + f’(0)x + * /"(OH - 2 + • • • , 

it is in effect written as an infinite sum of the basis vectors (or functions) 1 , x, x 2 , . . ., 
each with the numerical coefficients /(0), f'(0), \f"(0), . . ., respectively. 
Adding two functions is the same as adding the two infinite vectors (or series); 
and multiplying by a number is equivalent to multiplying each coefficient by the 
same number. What about infinite matrices? Take a look at the following form of 
differentiation, here written in matrix form, 
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And just as there are various bases that can be used in geometry, so there are 
different ways to expand functions, the most celebrated being the Fourier series 

fix) = ao + ai cosx + b i sin x + <22 cos 2x + bi sin 2x + • ■ ■ . 

The basis vectors are now 1, cos x , sin.r, cos 2x, etc. What matrix does differentia- 
tion take with respect to this basis? 

If we accept that all this is possible and makes sense, we are suddenly made 
aware of a new unification of mathematics: certain differential equations are matrix 
equations, the Fourier and Laplace transforms can be thought of as generalized 
“matrices” mapping a function (vector) to another function, etc. Solving a linear 
differential equation, and finding the inverse Fourier transform, are equivalent to 
finding the inverse of their “matrices”. 

Do we gain anything by converting to a matrix picture? Apart from the practical 
matter that there are many known algorithms that deal with matrices, a deeper reason 
is that linear algebra and geometry give insights to the subject of functions that we 
may not have had before. Euclid’s theorems may possibly still be valid for functions 
if we think of them as ’points’ in an infinite-dimensional vector space. We wake up 
to the possibility of a function being perpendicular to another, for example, and that 
a function may have a closest function in a “plane” of functions. 

Conversely, ideas from classical analysis may be transferred to linear algebra. 
Since square matrices can be multiplied with themselves, can the geometric series 
1 + A + A 2 + ■ ■ ■ make sense for matrices? Perhaps one can take the exponential 
of a matrix e A : = 1 + A + A 2 / 2! + A 3 / 3! H — ■ . There’s no better way than to take 
the plunge and try it out, say on the differentiation ‘matrix’ D, 

e D f(x) = (1 + D + D 2 / 2 +■■■ )f(x) = f(x ) + fix) + fix)/ 2 + ■•■ = /(* + 1) 

(by a Taylor expansion around x). The “matrix” e D certainly has meaning: it per- 
forms an unexpected, if mundane, operation, it shifts the function / one step to the 
left! Again, suppose we have the equation y' — 2 y = e x \ manipulating the deriv- 
ative blindly as if it were a number gives a correct solution (but not the general 
solution) 


y = (D — 2)~ 1 e x = -^(1 + D/2 + D 2 /4 + ■ ■ ■ )e x = -e x . 

Yet repeating for the equation y' — 2y = e 2x fails to give a meaningful solu- 
tion. 

In fact, historically, the subject of functional analysis as we know it started in 
the 19th century when mathematicians started to notice the connections between 
differential equations and matrices. For example, the equation 


y'ix) = aix)yix ) + gix) 
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can be written in equivalent form as 

y(x)=f a{s)y(s) d.v + g(x). (1.1) 

Jx 0 


The integral J* a(s)y(s ) ds is an infinitesimal version of and can t> e 

thought of as a transformation of y(x). Equation (1.1) is akin to a matrix equation 
y = Ay + b, and we are tempted to try out the solution y = (1 — A)~ l b = 
(1 + A + A 2 + ---)b. 

Nonetheless, technical problems in carrying out this generalization arise immedi- 
ately: are the components of an infinite vector unique? They would be if the vectors e„ 
are in some sense ‘perpendicular’ to each other. But what is this supposed to mean, 
say for the MacLaurin series? After all, there do exist non-zero functions whose 
MacLaurin coefficients are all zero. The question of whether the Fourier coefficients 
are unique took almost a century to answer! And extra care must be taken to handle 
infinite vectors. For example, let 


:= ( 

1, 

0, 

0, 0, . 


vi := ( 

-1, 

1, 

0, 0, . 


i>3 := ( 

0, 

-1, 

1, 0, . 


V4 ■= ( 

0, 

0, 

-1, 1, . 



It seems clear that 


ft + ®2 + V3 H = 0, 

yet the size of the sum of the first n vectors never diminishes: 

v := tq -( + v n = (0, . . . , 0, 1, 0, . . .) =>■ v ■ v = 1. 

Because of these trapfalls, we need to proceed with extra caution. It turns out that 
many of the equations written above are capable of different interpretations and so 
cannot be taken to be literally true. 

These considerations force us to consider the meaning of convergence. The reader 
may already be familiar with the real line M, in which one can speak about conver- 
gence of sequences of numbers, and continuity of functions. Some of the main results 
in real analysis are 

(i) Cauchy sequences converge, 

(ii) for continuous functions, if x n -> x then f(x„) — > f(x), 

(iii) continuous real functions are bounded on intervals of type [a, b] and have the 
intermediate value property. 
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We seek generalizations of these to and possibly to infinite dimensional spaces. 
We do seem to have an intuitive sense of what it means for vectors to converge 
x n —> x , but can it be made rigorous? Is it true that if x n — » x and y n — > y 
then /( x n , y n ) ,f(x,y ) when / is a continuous function? Are continuous real 

functions bounded on “rectangles” [a , b] x [c, d ] , and is the latter the correct analogue 
of an “interval”? Since vector functions are common in applications, it is important 
to show how these theorems apply in a much more general setting than R, and this 
can be achieved by stripping off any inessential structure, such as its order (X). As 
we proceed to answer these questions, we will see that the real line is very special 
indeed. Intervals play several roles in real analysis, roles that are distinguished apart 
in R w , where we speak instead of connected sets, balls, etc. 

The book is divided into three parts: the first considers convergence, continuity, 
and related concepts, the second part treats infinite vectors and their matrices, and 
the third part tackles infinite series of matrices and more. 

Functional analysis is a rich subject because it combines two large branches of 
mathematics: the topological branch concerns itself with convergence, continuity, 
connectivity, boundedness, etc.; the algebraic branch concerns itself with operations, 
groups, rings, vectors, etc. Problems from such different fields as matrix algebras, 
differential equations and approximation theory, can be unified in one framework. As 
in most of mathematics, there are two streams of study: the abstract theory deduces 
the general results, starting from axioms, while the concrete examples are shown 
to be part of this theory. Inevitably, the former appears elegant and powerful, and 
the latter full of detail and perhaps daunting. Nonetheless, both pedagogically and 
historically, it is often by examples that one understands the abstract, and by the 
theory that one makes headway with concrete problems. 

Most sections contain a number of worked out examples, notes, and exercises: it 
is suggested that a section is first read in full, including its propositions and exercises. 
These exercises are an essential part of the book; they should be worked out before 
moving to the next section (some hints and answers are provided in the appendix, 
and many worked solutions can be found in the book’s website) http ://www. springer. 
com/mathematics/analysis/book/978-3-3 19-06727-8. To prevent the exercises from 
becoming a litany of “Show ...” and “Prove ...”, these terms have frequently been 
omitted, partly to instil an attitude of critical reading. As a guide, the notes and 
exercises have been marked as follows: 

► refers to important notes and results; 

* more advanced or difficult exercises that can be skipped on a first reading; 

O side remarks that can be skipped without losing any essential ideas. 
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1.1 Preliminaries 

Familiarity with the following mathematical notions and notation is assumed: 


Logic and Sets 

The basic logical symbols are =>■ (implies), NOT, AND, OR, as well as the quantifiers 
3 (there exists) and V (for all). The reader should be familiar with the basic proof 
strategies, such as proving <f> =>■ if) by its contrapositive (NOT ijj) =>■ (NOT </>), and 
proofs by contradiction. The negation of Vx <f> x is 3x (NOT (f> x )\ and NOT ( 3 a fi x ) 
is the same as Vx (NOT <j> x ). The symbol := is used to define the left-hand symbol 
as the right-hand expression, e.g. e := "!• 

A set consists of elements, and x e A denotes that x is an element of the set A. 
The empty set 0 contains no elements, so x e 0 is a contradiction. 

The following sets of numbers are the foundational cornerstones of mathematics: 
the natural numbers N = { 0, 1, . . . }, the integers Z, the rational numbers Q, the real 
numbers R, and the complex numbers C. The induction principle applies for N, 

If A c N and 0 e A and 'in, (n € A =>- n + 1 e A) then A = N. 

Although variables should be quantified to make sense of statements, as in 
Vo e Q, a 2 / 2, in practice one often takes shortcuts to avoid repeating the obvious. 
This book uses the convention that if a statement mentions variables without accom- 
panying quantifiers, say, || x + v || ^ || jc || + ||y||, these are assumed to be Vx, Vy, etc., 
in the space under consideration. Natural numbers are usually, but not exclusively, 
denoted by the variables m, n, N, . . ., real numbers by a, b, . . ., and complex num- 
bers by z, w, . . .. An unspecified X (or Y ) refers to a metric space, a normed space, 
or a Banach algebra, depending on the chapter. 

Sets are often defined in terms of a property, A := {x e X : (j) x }, where X is 
a given ‘universal set’ and <j> x a statement about x. For example, R + := {x e R : 
x 0}. 

A C B denotes that A is a subset of B , i.e., xeA =>■ x e B\ A C B means 
A C B but A B. A “non-trivial” or “proper” subset of X is one which is not 

0 or X. “Nested sets” are contained in each other as in A i C A 2 c A 3 C . . . or 
... c A 2 C Ai. 

The complement of a set A is denoted by A\ A, or by A c for short; A cc = A, and 
A C B O B c C A c . A fl B and A U B are the intersection and union of two sets, 
respectively. Two sets are “disjoint” when A fl B = 0 . De Morgan’s laws state that 
(A U B) c — A c C\B c and (A (T Bf = A c U B c . In general, the union and intersection 
of a number of sets are denoted by |J ; A, and fj ( A, (where the range of the index 

1 is understood by the context). A “cover” of A is a collection of sets { B, : i e I } 
whose union includes A, i.e., A C (J ( . Bp, a “partition” of X is a cover by disjoint 
subsets of X. 
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Pairs of elements are denoted by ( x , y ), or as P), generalized to finite ordered 
lists (xi , . . . , x,y ). The product of two sets is the set of pairs 

X x Y [(x, y) : x e X, y e Y } 

in particular X 2 := X x X — { (x, y) : x, y e X }, and by analogy 

X N := { (jci, ...,x N ) : Xi e X, i = 1, . . . , N }. 

An important example is the plane M 2 , whose points are pairs of real numbers (called 
“coordinates”). The unit disk is { (x, y) e M 2 : x 2 + y 2 R 1 }; its perimeter is the 
unit circle S l := { (x, y) e R 2 : x 2 + y 2 = 1 }. 


Functions 


A function f: X — »■ Y, x t-x f (x ) , assigns, for every input x e X, a unique 
output element fix) e Y. (It need not be an explicit procedure.) X is called the 
“domain” of / and Y its “codomain”. Functions are also referred to as “maps” or 
“transformations”. To avoid being too pedantic, we sometimes say, for example, “the 
function x m- e x ” without reference to the domain and codomain, when these are 
obvious from the context. The “image” of a subset A C X, and the “pre-image” of 
a subset B C Y are 

f A := { f(a ) e Y : a e A}, f~ 1 B := {a e X : f(a ) e B}. 


The image of / is im / := fX. It is easy to show that for any number of sets A, , 


/IR = U.M'. 

i i 


/ n c n 

i i 

r'f) Ai = f)r'A 


The set of functions f : X Y is denoted by Y x . 

Some functions can be composed together / o g(x) := f(g(x)) whenever the 
image of g lies in the domain of /. Composing with the trivial identity function 
I\X-+X,x\-+x (one for each set X). has no effect, / o I = f. 

The restriction of a function / : X — > Y to a subset M C A is the function 
/| m : M — ► Y which agrees with / on M, i.e., f\ m(x) = fix) whenever x e M. 
Conversely, an extension of a function is another function / : A Y where X C. A, 
such that f(x) = fix) whenever x e X. 

The reader should be familiar with the functions x i->- — x, x n , jx|, for x e R 
or C; (x, >’) i — ^ x T- y, xy, with domain M 2 or C 2 ; (x, y) i->- x/y for y ^ 0; and 
(xi, . . . , xn) i-^ max(xi, . . . , xn) for real numbers x/. In particular, the absolute 
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value function satisfies 


|a + fe|^|a| + |&|, \a\ ^ 0, |«| = 0 O a = 0, \ab\ = \a\\b\. 


Conjugation is the function : C — > C, a + ib i-* a — ib\ its properties are 

_ _ _ _ — _ o 

z + w = z + w, zw = zw, z = z, zz = \z\~- 


The Kronecker delta function is Si/ := ^ 

x i-»- e x , M — > R, may be defined by e x := 
e x > 0. 


, '[ . The exponential function 
' / J 

X,^=o TiT ’ it satisfies e° = 1 and 


Sequences are functions x: N — > X , but xin) is usually written as x n , and the 
whole sequence x is referred to by (x n )neN or (To, xi, . . .) or even just (x n ) ; real or 
complex-valued sequences are denoted by bold symbols, x. For example (1 /2") is the 
sequence (1 , 1/2, 1/4, . . .), which is shorthand for 0 i->- 1, 1 i — >• 1/2, etc. It is impor- 
tant to realize that ( x n ) is a function and not a set of values, e.g. (1, — 1, 1, — 1, . . .) 
is quite different from (— 1, 1, 1, 1, . . .) and (— 1, 1, — 1, 1, . . .), even if they have 
the same set of values. The set of real-valued sequences is denoted by E' ! : = { x : 
N — »■ M }, and of the complex- valued sequences by C N . Functions x : Z — »■ X are 
also sometimes called sequences and are denoted by (x n ) ne i- 

Polynomials (of one variable) are functions p : C — ► C that are a finite number of 
compositions of additions and multiplications only; every polynomial can be written 
in the standard form p(z) = a n Z n + ■ ■ ■ + a\z + «o Or € C, a n ^ 0 unless p — 0); 
n is called the degree of p. 

A function f : X Y is 1-1 (“one-to-one”) or injective when f(x) = fiy) =^> 
x = y; it is onto or surjective when fX = Y. A bijection is a function which is both 
1-1 and onto; every bijection has an inverse function / , whereby / -1 o f(x) = x, 

) = y- 

Sets may be finite, countably infinite, or uncountable, depending on whether there 
exists a bijection from the set to, respectively, (i) a set { 1, . . . , n } for some natural 
number n, or (ii) N, or (iii) otherwise. In simple terms, a set is countable when its 
elements can be listed, and finite when the list terminates. If A, B are countable sets 
then so is A x B; more generally, the union of the countable sets A„,n = 0, 1, 2, . . ., 
is again countable: 


A 0 = { 

a oo, 

At = { 

ato. 

a 2 = { 

A20- 


flOl, CIQ2, ■ ■ ■ } 

/ / 7 

ail, - an , ■■■ } 

/ / 

an. 


/ 


an. 
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oo 

[J A n = { aoo, aoi, «io, « 02 , • • ■ } 

n = 0 


A relation is a statement about pairs of elements taken from XxY,s. g.x = y 2 + 1 
for (x, y) e R 2 . An equivalence relation «ona set X is one which is 

reflexive x & x, 

symmetric x & y O y & x, 
transitive x & y & z =>■ x ^ z. 

An equivalence relation induces a partition of the set X into equivalence classes 
[a] := { x e X : x a }. 

An order ^ is a relation which is reflexive, transitive and anti-symmetric x 
y ^ x x = y. One writes x < y when x ^ y but x ^ y. A linear order is 
one which also satisfies x ^ y OR y ^ x. A number x is “positive” when x A 0, 
whereas “strictly positive” means x > 0. An “upper bound” of a set A is a number b 
which is larger than any a e A. A “least upper bound”, denoted sup A, is the smallest 
such upper bound (if it exists), i.e., every upper bound of A is greater than or equal to 
sup A. There are analogous definitions of lower bounds and greatest lower bounds, 
denoted inf A. 

A group is a set G with an associative operation and an identity element 1, such 
that each element x e G has an inverse element je , 

x(yz) = ( xy)z , lx = x = xl, xx~ l = 1 = x~ l x. 

A subgroup is a subset of G which is itself a group with the same operation and 
identity. A normal subgroup is a subgroup H such that x~ l Hx C H for all x e G. 
An example of a group is the set C\{ 0 } with the operation of multiplication; the set 
S := { e' 8 : 9 e M } is a subgroup since e l9 e l ^ = e‘(®+0), 1 = e'°, (e‘ e )~ l = e~ ,e 
are all in S. 

Afield F is a set of numbers, such as Q, R, or C, whose elements can be added 
and multiplied together associatively, commutatively, and distributively, 

Va, b, c e F, (a + b) + c = a + (b + c), ( ab)c = a(bc), 

a + b = b + a , ab = ba, 

(a + b)c = ac + be, 

there is a zero 0 and an identity 1 , every element a has an additive inverse, or negative, 
— a , and every a ^ 0 has a multiplicative inverse, or reciprocal , 1 / a . 

0 + a = a, 1 a = a 
a + (—a) = 0, a~ — 1 (a ^ 0). 


The real number space M is that unique field which has a linear order ^ such that 
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(b) Every non-empty subset with an upper bound has a least upper bound. 
The intervals are the subsets 


[a, b] := { x e K : a R x R b }, :={i6l:d<j0), 

[a, b[ := { x € R : a R x < b }, ]a, b[ := { x e R. : a < x < b }, 

[a, oof := fx e R. : a R x }, ]a, oof := fx e K : a < x } 

]— oo, a]:=fxeK:x^a}, ]— oo,fl[:=fxeK:x < a), 

where a < b are fixed real numbers. The real numbers satisfy the Archimedean 
property 


Vx > 0, 3n e N, n > x. 

The proof is simple: If the set N had an upper bound in R then it would have a least 
upper bound a; by definition, this implies that a — 1 is not an upper bound, meaning 
there is a number n e N such that n > a — 1 ; yet n + 1 R a. This contradiction 
shows that no x e K is an upper bound of N: there is an n e N such that n > x. 

There is an important set principle that is not usually covered in elementary 
mathematics textbooks: 

The Axiom of Choice: If A — { A a : a e I } is a collection of non-empty subsets 
of a set X (the index a ranges over some set I), then there is a function / : / — > X 
such that f(a) e A a . 

That is, this ‘choice’ function selects an element from each of the sets A a . The 
Axiom of Choice is often used to create a sequence (x„) from a given list of non- 
empty sets A n , with x„ e A n . It seems obvious that if a set is non-empty then an 
element of it can be selected, but the existence of such a procedure cannot be proved 
from the other standard set axioms. 


Part I 
Metric Spaces 



Chapter 2 

Distance 


Metric spaces can be thought of as very basic spaces, with only a few axioms, where 
the ideas of convergence and continuity exist. The fundamental ingredient that is 
needed to make these concepts rigorous is that of a distance, also called a metric, 
which is a measure of how close elements are to each other. 

Definition 2.1 

A distance (or metric) on a metric space X is a function 


d : X 2 


(x, y) \-> d(x, y ) 


such that the following properties (called axioms) hold for all x, y, z & X, 
(i) d{x,y) < d(x, z) + d(z, y) (Triangle Inequality), 


z 



(ii) d(y, x) = d(x, y), (Symmetry) 


(iii) d(x, y) = 0 <£> x = y. x 


A metric space is not just a set, in which the elements have no relation to each 
other, but a set X equipped with a particular structure, its distance function d. One 
can emphasize this by denoting the metric space by the pair (A, d), although it is 
more convenient to denote different metric spaces by different symbols such as X, Y, 
etc. 

In what follows, X will denote an abstract set with a distance, not necessarily M 
or M w , although these are of the most immediate interest. We still call its elements 
“points”, whether they are in reality geometric points, sequences, or functions. What 
matters, as far as metric spaces are concerned, is not the internal structure of its 
points, but their outward relation to other points. 
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Maurice Frechet (1878-1973) studied under Hadamard (who 
had proved the prime number theorem and had succeeded 
Poincare) and Borel at the University of Paris (Ecole Normale 
Superieure); his 1906 thesis developed “abstract analysis”, an 
axiomatic approach to abstract functions that allows the Eu- 
clidean concepts of convergence and distance, as well as the 
usual algebraic operations, to be applied to functions. Many 
terms, such as metric space, completeness, compactness etc., 
are due to him. 

Fig. 2.1 Frechet 



Although most distance functions treated in this book are of the type d(x, y) = 
\x — y|, as for R, the point of studying metric spaces in more generality is not only 
that there are some exceptions that don’t fit this type, but also to emphasize that 
addition/subtraction is not essential, as well as to prepare the groundwork for even 
more general spaces, called topological spaces, in which pure convergence is studied 
without reference to distances (but which are not covered in this book). 

There are two additional axioms satisfied by some metric spaces that merit par- 
ticular attention: complete metrics, which guarantee that their Cauchy sequences 
converge, and separable metric spaces whose elements can be handled by approx- 
imations. Both properties are possessed by compact metric spaces, which is what 
is often meant when the term “finite” is applied in a geometric sense. These are 
considered in later sections. 

Easy Consequences 

1. d{x, z ) > | d(x, y ) — d(z, y)|. 

2. If x \ , . . . , x„ are points in X, then by induction on n. 


d(x \ , x n ) ^ d(x i, X 2 ) + ■ ■ ■ + d(x n - 1 , x n ). 


Examples 2.2 

1. The spaces N, Z, Q, R, and C have the standard distance d(a, b) := \a — b\. Check 
that the three axioms for a distance are satisfied, making use of the in/equalities 
|s + t\ ^ |s| + |f|, | 5 1 = |s|, and |s| = 0 0 s = 0. 

2. ► The vector spaces R A ' and C iV have the standard Euclidean distance defined by 

d(x, y) := I - bt \ 2 for x = (a\ a N ), y = {b\, ...,b N ) (prove 

this for N — 2). 

3. One can define distances on other more general spaces, e.g. we will later show 
that the space of real continuous functions / with domain [0. 1] has a distance 
defined by d(f, g) := max ve[0 ,i] | f(x) - g(x ) |. 

4. O The space of ‘shapes’ in K 2 (roughly speaking, subsets that have an area) have 
a metric d(A, B) defined as the area of (A U B)\(A (T B). 
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5. ► Any subset of a metric space is itself a metric space (with the ‘inherited’ or 
‘induced’ distance). (The three axioms are such that they remain valid for points 
in a subset of a metric space.) 

6. ► The product of two metric spaces, IxF, can be given several distances, none 
of which have a natural preference. Two of them are the following 

D\ ((*!)’ (n)) '■= dx(xi,X2) + d Y (yi , yi), 

Do O ((y{) > (y 2)) : = max(dx(xi,x 2 ),d Y (yi, y 2 )). 


For convenience, we choose D\ as our standard metric for Ixf, except for R iV 
and C /V , for which we take the Euclidean one. 

Proof for D\ \ Positivity of D\ and axiom (ii) are obvious. To prove axiom (iii), 
D i(x l ,x 2 ) = 0 implies d x (xi,x 2 ) = 0 = d Y (y u yi), so*i = x 2 , yi = }’2, and 
x 1 = ({,]) = ( yfj — x 2 - As for the triangle inequality, 

Di (x 1 , X2) = dx {X \ , x 2 ) + d Y (yi , y 2 ) 

^ d x (x i,x 3 ) +d x (x 3,x 2 ) + d Y (yi, v 3 ) +d Y (ys, y 2 ) 

= T>l(Xi,X3) + D[(Xi, x 2 ). 


Exercises 2.3 


1. Show that if d (x . -) > d (z , y ) then x f y. 

2. Write in mathematical language, 

(a) The subsets A, B are close to within 2 distance units; 

(b) A and B are arbitrarily close. 


3. The set of bytes, i.e., sequences of Os and Is (bits) of length 8 (or any length), 
has a “Hamming distance” defined as the number of bits where two bytes differ; 
e.g. the Hamming distance between 1001011 1 and 1 1001 101 is 4. 


metric d(x, y) := 


. Indeed, there are infinitely many other metrics 


4. Any non-empty set can be given a distance function. The simplest is the discrete 
1 x f y 
0 x = y 

on the same set (except when there is only one point!); for example, if d is a 
distance function then so are 2d and d/( 1 + d). 

(* Not every function of d will do though! The function d 2 is not generally a 
metric; what properties does / : im d — > R + need to have in order that / o d 
also be a metric?) 


5. A set may have several distances defined on it, but each has to be considered as 
a different metric space. For example, the set of positive natural numbers has a 
distance defined by d(m, n) := \\/m — \/n \ (prove!); the metric space associated 
with it has very different properties from N with the standard Euclidean distance. 
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For example, in this space one can find distinct natural numbers that are arbitrarily 
close to each other. 

6. Let n = ±.2 k 3 r • • • be the prime decomposition of any neZ and define \r 1\2 := 
1/2 , 1 0 | 2 := 0. Show that | • I 2 satisfies the same properties as the standard 
absolute value and hence that dim. n) := \m — nf is a distance on Z (called the 
2-adic metric). 

7. * Given the distances between n points in R' v , can their positions be recovered? 
Can their relative positions be recovered? 


2.1 Balls and Open Sets 

The distance function provides an idea of the “surroundings” of a point. Given a 
point a and a number r > 0, we can distinguish between those points ‘near’ to it, 
satisfying d(x, a) < r, and those that are not. 

Definition 2.4 


An (open) ball, with center a and radius r > 0, is the set 
B r (a) { x e X : d(x, a) < r }. 


Despite the name, we should lay aside any preconception we may have of it being 
“round” or symmetric. We are now ready for our first, simple, proposition: 

Proposition 2.5 

Distinct points of a metric space can be separated by disjoint balls, 

x / y 3r > 0 B r (x) fl B r {y) = 0. 


Proof If x ^ y then d(x, y) > 0 by axiom (iii). Letting r := d(x. y)/2, then B r {x) 
is disjoint from B, (y) else we get a contradiction, 

Z G B r (x) fl B r (y) =>• d(x, z) < r ANDd(y, z) < r 
=3- d(x, y) ^ d(x, z) + d(y, z ) 

<2 r = d{x, y). 


□ 
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Examples 2.6 

1 . In R, every ball is an open interval 

B r (a ) = { x e R. : |x — a\ < r } = ]a — r, a + r[. 

Conversely, any open interval of the type ]a, b\ is a ball in R, namely 

2. In R 2 , the ball B r (a) is the disk with center a and radius r without the circular 
perimeter. 

3. InZ, B\/ 2 (ni) — { n e Z : \n—m\ < \ } = {m} and B 2 (m) = { m — 1, m, m+1 }. 

4. It is clear that balls differ depending on the context of the metric space; thus 
B\/ 2 (0) = ] — in R, but /?i/2(0) = { 0} in Z. 


Open Sets 

We can use balls to explore the relation between a point x and a given set A. As the 
radius of the ball B, (x) is increased, one is certain to include some points which 
are in A and some points which are not, unless A = X or A — 0. So it is more 
interesting to investigate what can happen when the radius is small. There are three 
possibilities as r is decreased: either B r (x) contains (i) only points of A, or (ii) only 
points in its complement A c , or (iii) points of both A and A c , no matter how small 
we take r. 

Definition 2.7 


A point x of a set A is called an interior point of A when it can be “surrounded 
completely” by points of A, i.e., 

3 r > 0, B r (x) c A. 

In this case, A is also said to be a neighborhood of x. 

A point x (not in A) is an exterior point of A when 

3 r > 0, B r (x) c X\A. 

All other points are called boundary points of A. 

Accordingly, the set X is partitioned into three parts: its interior A°, its 
exterior (A) c , and its boundary 3 A. The set of interior and boundary points of 
A is called the closure of A and denoted by A := A° U 3 A. 

A set A is open in X when all its points are interior points of it, i.e., A = A° 
(Fig. 2.2). 
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A small enough ball around an exterior point 


Any ball around a boundary point 


A small enough ball around an interior point 


Fig. 2.2 The distinction between interior, boundary, and exterior points 

Examples 2.8 

1. In 1R, the intervals ]a, b[, [a, /;(, ]a, b], and [a, b] have the same interior ]«, b[, 
exterior, and boundary {a, b}; their closure is ]a, h\ = [a, b]. 

Proof For any a < x < b, let 0 < e < min(x — a,b — x), then a < x — e < 
x + € < b, that is B f (x) c ]a, b[\ this makes x an interior point of the interval. 

For x < a, there is an e < a — x such that x e B € (x) C ]— oo, a[ C R\[a, b]. 
Similarly, any x > b is an exterior point of the interval. 

For x = a, any small interval B € (a ) contains points such as a + e/2, that are 
inside B € (a), and points outside it, such as a — e/2, making a (and similarly h) 
a boundary point. 

2. ► The following sets are open in any metric space X: 

(a) X \ { x } for any point a . The reason is that any other point y ^ x is separated 
from x by disjoint balls (our first proposition); this makes y an interior point 
of X\{a}. 

(b) The empty set is open by default, because it does not contain any point. The 
whole space X is also open because B r ( x) C X for any r > 0 and x e X. 

(c) Balls are open sets in any metric space. 

Proof Let x e B r (a) be any point in the 
given ball, meaning d(x, a) < r. Let e := 
r—d(x,a ) > 0; then /L (x) C B r (a) since 
for any y e B € ( a), 

d(y, a) ^ d(y, x) + d(x, a) < e + d(x, a) — r. 

3. ► The least upper bound of a set A in R is a boundary point of it. 

Proof Let a be the least upper bound of A. For any e > 0, a + e/2 is an upper 
bound of A but does not belong to it (else a would not be an upper bound). 





2.1 Balls and Open Sets 
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Even if a f A, then the interval ]a — e/2, «[ cannot be devoid of elements of 
A, otherwise a would not be the least upper bound. So the neighborhood B € (a ) 
contains elements of both A and A c . 

Proposition 2.9 

The set of interior points A° is the largest open set inside A. 


Proof If B C A then the interior points of B are 
obviously interior points of A, so B° C A°. In 
particular every open subset of A lies inside A° 

(because B = B°), and every (open) ball in A 
lies in A°. This implies that if B r (x) C A then 
B r (x) C A°, so that every interior point of A 
is surrounded by other interior points, and A° is 
open. □ 

Proposition 2.10 

A set A is open <£> A is the union of balls. 



Proof Let A be an open set. Then every point of it is interior, and can be covered by 
a ball B rtx fx) C A. Taking the union of all the points of A gives 

A=[J{r)c[J B,-( X )(x) C A, 

xe A xeA 


forcing A = (J XC _ A B r ( x )(x), a union of balls. 

Now let A := (J ; - B n (a,-) be a union of balls, and let x be any point in A. 
Then x is in at least one of these balls, say, B r (a). But balls are open and hence 
x e B € {x) c B, (a) C A. Therefore A consists of interior points and so is open. □ 

The early years of research in metric spaces have shown that most of the basic 
theorems about metric spaces can be deduced from the following characteristic prop- 
erties of open sets: 

Theorem 2.11 


Any union of open sets is open. 

The finite intersection of open sets is open. 
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Proof (i) Consider the union of open sets, (J ; A/ . Any x e [J ( A, must lie in at least 
one of the open sets, say Ay. Therefore, 


x e B r (x) c Ay c (J Aj 

i 

shows that it must be an interior point of the union. 

(ii) It is enough, using induction (show!), to con- 
sider the intersection of two open sets A D B. Let 
x e Afl5, meaning x e A and x e B, with 
both sets being open. Therefore there are open balls 
B n (x) c A and B n (x) C B. The smaller of these 
two balls, with radius r := min(/-| , n ) , must lie in 

An b, 

x e B, (x) = B n (x) fl B ri (x) c A (T B . 

□ 



Examples 2.12 

1. ► The exterior (A) c = (A c )° of a subset A is open in X. 

2. A° = A\3A. So a set is open it does not contain any boundary points. 

3. Let Y C. X inherit A’s distance. Then A is open in Y if, and only if, A = U fl Y 
for some subset U open in X. 

Proof Care must be taken to distinguish balls in Y from those in X: B^. (x) = 
B*(x) fl Y. If A is open in Y, then by Proposition 2.10, 

A =U <*)(«) = U <«)(«) n r = u n Y. 

aeA aeA 

For the converse, interior points of U c X which happen to be in Y are interior 
points of A as a subset of T, 

y € B*{y) C(/ 4 ye B?(y) HT C £/ HY = A. 


Limit Points 

It may happen that a point a of a set A is surrounded by points not in A, that is, there 
is a ball B r (a) which contains no points of A other than a itself. We call such points 
isolated points. The property that a point cannot be isolated from the rest of A is 
captured by the following definition: 


2.1 Balls and Open Sets 
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Definition 2.13 


A point b (not necessarily in A) is a limit point of a set A when every ball 
around it contains other points of A, 

Ve > 0, 3 a ^ b, a e An B e (b). 


Thus every point of A is either a limit point or an isolated point of A. 

Exercises 2.14 

1. In R, the set { a } has no interior points, a single boundary point a, and all other 
points are exterior. It is not an open set in R. There are ever smaller open sets 
that contain a, but there is no smallest one. 

2. In R, { 1 /n:neN] = { 1 /n : n e N} U { 0 }. 

3. The set Q, and also its complement, the set of irrational numbers Q c , do not 
have interior (or exterior) points in R. Every real number is a boundary point of 

Q. 

Similarly every complex number is a boundary point of Q + iQ. 

4. The set { m } in Z does not have any boundary points; it is an open set in Z 

(Bl /2 (m) = {m}). 

► Notice that whether a point is in the interior (or boundary, or exterior) of a set 
depends on the metric space under consideration. For example, { m } is open in N 
but not open in R; the interval ]«, b[ is open in R, but not open when considered 
as a subset of the x-axis in R 2 . We thus need to specify that a set A is open in X. 

5. Describe the interior, boundary and exterior of the sets 

{ (x, y) e R 2 : |x| + \y\ < 1 }, { (x, y) e R 2 : \ < max(|x|, \y\) < 1 }. 

6. Of the proper intervals in R, only ]a. b[, ]a, oo[, and ]— oo, a[ are open. 

7. In R 2 , the half-plane { (x, y) e R 2 : y > 0 } and the rectangles ]a, b[ x ]c, d[ := 
{ (x, y) e R 2 : a < x < b, c < y < d } are open sets. 

8. ► A c has the same boundary as A; its interior is the exterior of A, that is, 
(A) c = (A c )° (and A = A CoC ); so 3A = A O AA 

9. Find an open subset of R, apart from R itself, without an exterior. 

So, the exterior of the exterior of A need not be the interior of A. Similarly, the 
boundary of A or A° need not equal the boundary of A. 

10. ► An infinite intersection of open sets need not be open. For example, in R, the 
open intervals ]— 1/n, 1 /n[ are nested one inside another. Their intersection is 
the non-open set { 0 } (prove!). Find another example in R 2 . 
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11. Deduce from the theorem that if every { x } is open in X, then every subset of X 
is open in X. This ‘extreme’ property is satisfied by N, and also by any discrete 
metric space. 

12. Any point x with d (x , a) > r is in the exterior of the open ball B r (a). But the 
boundary of B r (a) need not be the set { x : d(x,a ) = r }. Find a counterexample 
in Z. 

13. * Every open set in R is a countable disjoint union of open intervals. (Hint: An 
open set in R is the disjoint union of open intervals; take a rational interior point 
for each.) 

In contrast to this simple case, the open sets in R 2 , say, can be much more 
complicated — there is no simple characterization of them, apart from the defin- 
ition. 

14. Can a set not have limit points? Can an infinite set not have limit points? 

15. In R, the set of integers Z has no limit points, but all real numbers are limit points 
of Q. 

16. (a) 1 is an interior isolated point of { 1, 2 } in Z; 

(b) 1 is a boundary isolated point of { 1, 2 } in R; 

(c) 1 is an interior limit point of [0, 2] in R; 

(d) 1 is a boundary limit point of [0, 1] in R. 

17. In R and Q, an isolated point of a subset must be a boundary point, or, equiva- 
lently, an interior point is a limit point. 


2.2 Closed Sets 
Definition 2.15 


A set F is closed in a space X when X\F is open in X. 


Proposition 2.16 

A set F is closed F contains its boundary ■#> F = F. 


Proof We have already seen that the boundary of a set F and of its complement F c 
are the same (because the interior of F c is the exterior of F). So F is closed, and 
F c open, precisely when this common boundary does not belong to F c , but belongs 
instead to F cc — F. □ 
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Examples 2.17 

1. In M, the set [a, b ] is closed, since M\[a, b] = ]— oo, a[ U ]b , oof is the union of 
two open sets, hence itself open. Similarly [a, oof and ]— oo, a] are closed in R. 

2. N and Z are closed in R, but Q is not. 

3. ► In any metric space X, the following sets are closed in X (by inspecting their 
complements): 

(a) the singleton sets { x }, 

(b) the ‘closed balls’ B r [a] := {x £ X : d(x, a) ^ r }, 

(c) X and 0, 

(d) the boundary of any set (the complement of 3 A is A° U (A c )°). 

4. ► The complement of an open set is closed. More generally, if U is an open set 
and F a closed set in X, then I/\F is open and F^U is closed. The reasons are 
that U\F = U Cl F c and ( F\U) C = F c U U. 

Closed sets are complements of open ones, and their properties reflect this: 

Proposition 2.18 


The finite union of closed sets is closed. 
Any intersection of closed sets is closed. 


Proof These are the complementary results for open sets (Theorem 2.1 1). For F, G 
closed sets in A, F c , G c are open, so the result follows from 

(F U G) c = F c D G c , (0) C = lK. 

i i 

and the definition that the complement of a closed set is open. □ 

Theorem 2.19 Kuratowski’s closure ‘operator’ 

A is the smallest closed set containing A, called the closure of A. 

A C. B =>■ A C. B\ A = A; A U B = A U B. 

Proof The complement of A is the exterior of A, which is an open set, so A is closed. 
This implies A = A Proposition 2.16. 
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If A C B, then an exterior point of B is obviously an exterior point of A, that 
is ( B) c C (A) c ; so A C. B. It follows that if F is any closed set that contains A, 
then A C F = F, and this shows that A is the smallest closed set containing A. 
(Alternatively, Proposition 2.9 can be used: how?) 

Of course, A C A U B follows from A C A U B: combined with B C A U B, it 
gives A U B C A U B. Moreover, A U B is a closed set which contains AU5, and 
so must contain its closure A U B. □ 

Exercises 2.20 

1. It is easy to find sets in R which are neither open nor closed (so contain only 
part of their boundary). Can you find any that are both open and closed? 

The terms “open” and “closed” are misnomers, but they have stuck in the liter- 
ature, being derived from the earlier use of “open/closed intervals”. 

2. The set { x e Q : x 2 < 2 } is closed, and open, in Q. 

3 . In any metric space, a finite collection of points { a i , . . . , } is a closed set. 

4. The following sets are closed in R: [0, 1] U { 5 }, n + j]- 

5. The infinite union of closed sets may, but need not, be closed. For example, the 
set U«"Li { I / n ) is not closed in R; which boundary point is not contained in it? 

6. Find two disjoint closed sets (in R 2 or Q, say) that are arbitrarily close to each 
other. 

7. Start with the closed interval [0, 1] ; remove the 
open middle interval ]j, |[ to get two closed 
intervals [0, 5] U [|, 1]. Remove the middle 
interval of each of these intervals to obtain four 
closed intervals [0, g]U[g, |]U[|, |]U[|, 1], 

If we continue this process indefinitely we end 
up with the Cantor set. Show it is a closed set. 

8. Denote the decimal expansion of any number in [0,1] by 0.ni«2«3 • ■ ■■ Show 
that 

hi + ■ ■ ■ + nr 

[x e [0, 1] \x = 0.hih 2 h 3 ... =>• — ^-^<5 Vkj 

k 

is closed in R. 

9. ► One can define the “distance” between a point and a subset of a metric space 
by d(x , A) := inf ae A d(x, a). Then x e A exactly when d(x, A) = 0. 

10. Let x be an exterior point of A, and let y e A have the least distance between x 
and A. Do you think that y is unique? or that it must be on the boundary of A? 
Prove or disprove. For starters, take the metric space to be R 2 . 
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11. Show equality need not hold in A fl 5 C A (1 B. Indeed two disjoint sets may 
‘touch’ at a common boundary point. 

12. Show the complementary results of the theorem: A°nB° = (AnB)°,A°° = A°. 

13. If A c B, does it follow that A° c B1 


Dense Subsets 

We often need to approximate an element x e X to within some small distance e 
by an element from some special subset A <Z X. The elements of A may be simpler 
to describe, or more practical to work with, or may have nicer theoretical qualities. 
For example, computers cannot handle arbitrary real numbers and must approximate 
them by rational ones; polynomials are easier to work with than general continuous 
functions. The property that elements of a set A can be used to approximate elements 
of X to within any e, namely, 

Vx e X, We > 0, 3 a e A , d(x, a) < e, 

is equivalent to saying that any ball B € (x) contains elements of A, in other words A 
has no exterior points. 

Definition 2.21 


A set A is dense in X when A = X (so A contains all balls). 
A set A is nowhere dense in X when A contains no balls. 


Exercises 2.22 

1 . ► Q is dense in R. (This is equivalent to the Archimedean property of R.) More 
generally, a set A is dense in R when for any two distinct real numbers x < y, 
there is an element a e A between them x < a < y. 

2. The intersection of two open dense sets is again open and dense. 

3. A finite union of straight lines in R 2 is nowhere dense. Z and the Cantor set are 
nowhere dense in R. 

4. Nowhere dense sets have no interior points. 

5. A is nowhere dense in X <£> Z\A is dense in X A is the boundary of an 
open set. 

6. * What are the nowhere dense sets in R? (Hint: Exercise 2.14(13)) 
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2 Distance 


Remarks 2.23 

1. If d (x , y) = 0 does not guarantee x = y, but d satisfies the other two axioms, 
then it is called a pseudo-distance. In this case, let us say that points x and y 
are indistinguishable when d(x,y) = 0 Vz, d(x,z ) = d(y,z))- This is 
an equivalence relation, which induces a partition of the space into equivalence 
classes [x]. The function D([x], [y]) := d (x , y) is then a legitimate well-defined 
metric. 

In a similar vein, if d satisfies the triangle inequality, but is not symmetric, then 
D(x,y) := d(x, y) + d(y, x) is symmetric and still satisfies the triangle inequal- 
ity. 

Positivity oft/ follows from axioms (i) and (ii), d(x, y) Js \d(x, z) — d(y, z) Js 0. 

2. The axioms for a distance can be re-phrased as axioms for balls: 

(a) Bq(x) = 0, n, >o B r(x) = {x }, U/ >o B '-( x ) = ^ 

(b) {y : x e B r (y ) } = B r (x), 

(c) B s oB r {x) c B / +J (x),i.e.,ify e B s (z) where z e B r (x) then y e B r+S (x). 

3. The concept of open sets is more basic than that of distance. One can give a set X 
a collection of open sets satisfying the properties listed in Theorem 2.11 (taken 
as axioms), and study them without any reference to distances. It is then called a 
topological space', most theorems about metric spaces have generalizations that 
hold for topological spaces. There are some important topological spaces that are 
not metric spaces, e.g. the arbitrary product of metric spaces ]~[ ( - X, , and spaces 
of functions X Y := { f : Y — »■ X }. 


Chapter 3 

Convergence and Continuity 


3.1 Convergence 

The previous chapter was primarily intended to expand our vocabulary of 
mathematical terms in order to better describe and clarify the concepts that we will 
need. Our first task is to define convergence. 

Definition 3.1 


A sequence (x „ ) in a metric space X converges to a limit x, written 


x n — >■ x as n — » oo, when 

Ve > 0, 3 N, n ^ N =>• x n G B e (x). 



A sequence which does not converge is said to diverge. 


. x 0 


One may express this as “any neighborhood of x contains all the sequence from 
some point onwards,” or “eventually, the sequence points get arbitrarily close to the 
limit”. 

Proposition 3.2 


In a metric space, a sequence (x n ) can only converge to one limit, denoted 

lim x„. 

ft— >oo 


Proof Suppose x n — > x and x n -» y as n — > oo, with x f y. Then they can be 
separated by two disjoint balls B r (x) and B r (y) (Proposition 2.5). But convergence 
means 
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Felix Hausdorff (1868-1942) studied atmospheric refraction in 
Bessel’s school at Leipzig in 1891. In 1914, at 46 years of age 
in the University of Bonn, he published his major work on set 
theory, with chapters on partially ordered sets, measure spaces, 
topology and metric spaces, where he built upon Frechet’s ab- 
stract spaces, using open sets and neighborhoods. Later, in 
1919, he introduced fractional dimensions. But in the late 1930s, 
increasing Nazi persecution made life impossible for him. 


Fig. 3.1 Hausdorff 


3 N[ N\ => x n e B r {x), 

n ^ Nj => x n e B r (y). 


For n f max(A f i, N 2 ) this would result in x n e B r (x) fl B r (y) = 0, a 
contradiction. □ 


Examples 3.3 

1. In any metric space, x n — > x O d( x n ,x) — > 0 as n —> 00 (because 
x n e B e (x) d(x n ,x) < e). For example, x n x when d(x n ,x) ^ I / n 

holds. 

2. In R, n/(n + 1) — »■ 1 as n -» 00 , since for any e, there is an N such that 1 /N < e 
(Archimedean property of R), so 


n ^ N =>■ 


n 

1 

n + 1 


1 1 

< — < e 

n + 1 N 


3. Given two convergent real sequences a n —> a and b n — > b. then a n +b n — > a + b . 
Proof For any e > 0, there are Ah, Ah, such that 


n ^ N 1 =>■ | a n — a | < e, n ^ N 2 =>■ | b n — b\ < e. 


Thus for n ^ max(Ah , N 2 ), 


I (a n + b„) - (a + b)\ < \a„ - a\ + \b n - b\ < 2e. 

4. ► A sequence (*.") in X x Y converges to (y) if, and only if, x n x and y n — ► y. 

Proof Any distance in Example 2.2(6) can be used, but we will use the standard 
metric here. The distance between (y") and ( () is 

8 := d (( y "), (y)) = d(x n ,x) + d{y n , y) 0, as n -* 00 . 

As both d(x„, x) and d(y n , y) are less than 8, the converse follows. 
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5. Consider a composition of functions N — > N — * X where the first function is 1-1, 
and the second is a sequence. A subsequence is the case when the first function is 
strictly increasing, and a rearrangement is the case when it is 1-1 and onto. For 
example, 1, 1/4, 1/9, ... is a subsequence of (1 /n), and 1/2, 1, 1/4, 1/3, ... is a 
rearrangement. Any such ‘sub-selection’ of a convergent sequence also converges, 
to the same limit. 

Proof Suppose n N =>■ d (x n , x) < e. Let (x nj ) be a sub-selection of (x„). 
As tij f N can only be true of a finite number of indices i, with the largest, say, 
M, it follows that 


i>M=$-ni>N=> d(x nj ,x ) < e. 

6. A sequence converges fast (or ‘linearly’) when d(x n ,x) f Ac' 1 for some real 
constants A > 0, 0 < c < 1. Quadratic convergence, d (x„ , x) f Ac 2 , is even 
faster. Instead 1 / n and f/2 converge slowly. 


Limits and Closed Sets 

There are many questions in analysis of the type: If x n has a property A, and x n — > x, 
does x still have this property? For example, if a convergent sequence of vectors in 
the plane lies on a circle, will its limit also lie on the same circle? Or, can continuous 
functions (or differentiable, or integrable, etc.) converge to a discontinuous function? 
The following proposition answers this question in a general setting: the ‘property’ 
A needs to be closed in the metric space. 

Proposition 3.4 


If x n e A and x„ — > x, then x e A. 

Conversely, in a metric space, for any x e A there is a sequence x„ e A 
which converges to x. 


In particular, closed sets are “closed” under the process of taking the limit. 

Proof Take any ball B f (x) about x. If x„ converges to x, then all the sequence points 
will be in the ball for n large enough. Since x n e A , x cannot be an exterior point, 
and so lim x n = x e A. Of course, when A is closed, A — A (Proposition 2.16). 

n — > oo 

For the converse, let B\/ n (x) be a decreasing sequence of nested balls around 
x e A; whether x is a boundary or interior point of A, B i/ n (x) contains at least a 
point a n in A (which could be x itself). So d(a n , x) < 1 / n -> 0 as n — > oo, and 

□ 




x. 
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Exercises 3.5 

1 . ► In R, 

(a) \/n — > 0 (this is a rewording of the Archimedean property of the real num- 
bers: for every x > 0, there is an n e N such that n > x). 

(b) a" — »• OwhenO < a < 1, but diverges fora > 1. (Hint: When 1 < a = 1+5, 
then a n = 1 + nS + ■ ■ ■ > «<5; otherwise consider 1/a.) 

(c) n/a" — »■ 0 when a > 1, hence n k /a n = ( n/b' l ) k — »■ 0. 

(d) //a -> 1 for any a > 0, and n 1 '" -> 1 (so (logn)/n — ► 0). (Assuming 
a > 1, expand a 1 /" =: 1 + a„ using the binomial theorem to show that 
a„ < a/n -> 0; similarly show a~ < 2/(n — 1) for the second sequence.) 

(e) * (1 + 1 /«)" converges to a number denoted e. This is too hard to show for 
the moment. Show at least that the sequence is increasing but bounded by 3, 
using the binomial theorem. (This highlights the need of “convergence tests”: 
how can one know that a sequence converges when the limit is unknown?) 

(f ) \fn\ — > oo (what should this mean?) 

2. What do the sequences 2 + yj 2 + y/2 + • • • and 1 H 5-,— converge to, assuming 

t+iqrr 

they do? 

1 In 

3. In R, if a n —>■ 0 then a'] —>■ 0; find examples where (i) a n —>■ 0 but a n A 0. 
(ii) a n -r 1 but a" A '• 

4. ► If a„ ^ h„ for two convergent real sequences then lim a n ^ lim b n (Hint: 

n — > oo n — > oo 

[0, oo[ is closed). In particular, if a„ converges and a n < a, then lim a n a. 

n— >oo 

5. Squeezing principle: In M, ifa„ ^ x n ^ b n and lim a n — a = lim b n , thenx,, 

n — > oo n— »cx) 

converges (to a). 

6. It is possible for a divergent sequence to have a convergent subsequence. Find 
one in the sequence (1, — 1, 1, — 1, . . .). But any rearrangement must diverge. 

7. ► We may occasionally encounter ‘sequences’ with two indices ( a mn ) (they are 
more properly called nets). The example n/(n + m) shows that in general 

lim lim a mn ^ lim lim a mn . 

m— MX) n — > oo n— >oo m— >oo 

The same example shows that, in R, generally, sup„ inf„, a nm ^ inf,„ sup„ a mn . 
But the following are true: 

(a) sup„ sup m a„ m = sup,,, sup,, a nm , 

(b) sup„(a„ +/?„)< sup,, a„ + sup,, b n . 

8. If x n — ► x in a metric space X, and x n ^ x for all n, then x is a limit point of the 
set { jci , X 2 , x$, . . . }. But if x n is eventually constant in Jj N x„ = x), then 
x„ — > x without x being a limit point of { x„ } . 
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Note that given a real sequence ( x n ), even one that does not converge, the largest 
limit point of { x n } is denoted lim sup x n , and the smallest lim inf x n (if they exist). 


3.2 Continuity 

One is often not particularly interested in the actual values of the distances between 
points: no new theorems will result by substituting metres with feet. What matters 
more, in most cases, is convergence. Accordingly, functions that preserve conver- 
gence (rather than distance) take on a central importance. 

Definition 3.6 


A function / : X —*■ Y between metric spaces is continuous when it preserves 
convergence, 

x n -* x in A =>- f(x„) -* f(x) in Y. 


In this case therefore, /(lim„_ i , 00 x n ) = lim„_ j , 00 f(x n ). Before we see any exam- 
ples, let us prove that the following three statements are equivalent formulations of 
continuity in metric spaces, so any of them can be taken as the definition of continuity. 

Theorem 3.7 

A function / : X -y Y between metric spaces is (i) continuous 
<3- (ii) Vx e X, Ve > 0, 3<$ > 0, 

dx(x,x ') < S dy(fix), fix')) < e, 

<£> (iii) For every open set V in Y, f~ l V is open in X. 

The second statement is often written as lim t fix') = fix) for all x. 

Proof (i) =>■ (ii) Suppose statement (ii) is false; then there is a point x e X and an 
€ > 0 such that arbitrarily small changes to x can lead to sudden variations in fix), 

V5 > 0, 3x\ dxix,x') < S AND dyifix), fix')) ^ e. 

In particular, letting 5 = 1 / n, there is a sequence 1 x n e X satisfying dxix, x„) < 
\/n but dyifix), fix,,)) ^ €. This means that x n -a- x, but fix„) fix), 
contradicting statement (i). 


1 This selection of points x„ needs the Axiom of Choice for justification. 
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(ii) =>• (iii) Note that (ii) can be rewritten as 

Vx el, Ve > 0, 35 >0, x' e B s (x) =*■ f(x') e fl e (/(x)) 
or even as 

WxeX, We > 0, 35 > 0, f B s (x) B € (f(x)). 

Let V be an open set in Y . To show that U := f~ l V is open in X , let x be any point 
of U ; then f(x) e V, which is open. Hence 

f(x) e B e (f(x)) c y, 

and so 

35 > 0, c B ( (/(r)) c V. 

In other words, x is an interior point of U, 


35 >0, Bg(x) C f~ x V — U. 
X Y 


o 

/ 




/ • / 

\ ( 


I \fCzf / 

/-H-VJ 




(iii) =>■ (i) Let (x„) be a sequence converging to x. Consider any open neighborhood 
Be(f(x)) of f(x). Then f~ l B € (f(x)) contains x, and is an open set by (iii), so 

35 > 0, x e Bsix) c f~ l B € (f(x)), 

=► 35 > 0, fB s ix) c B e if(x)). 

But eventually all the points x n are inside Bsix), 

3 N >0, n > N =>■ x„ e Bsix) 

=> fix,,) e fBsix) c B e ifix)) 

=> dyifixn ), fix)) < e. 

This shows that f(x„) — »■ /(x) as n —> oo. □ 
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Examples 3.8 

1. The square root function on [0, oo[ is continuous. 

Proof Let x, e > 0, and 8 := e-Jf. (for x — 0, choose 8 = e 2 ), then 


■ y\ < & =>• \Vx- Vv| < 




'x + y/y l + VyJx 


< e. 


2. Let X, Y, Z be metric spaces, then the function h: X — » Y x Z defined by 
h(x) := (fix), g(x)) is continuous if, and only if, /, g are continuous. For 
example, the circle path 0 i->- (cos 9 , sin 0 ) is a continuous map 1R — > R 2 . 

Proof The statement follows directly from Example 3.3(4) 

(iixf) ^ (is) ^ f{xn) -* AN ° g(xn) 

3. ► If / : X — > Y is continuous, then f A C f A. So if A is dense in X, then f A 
is dense in fX. 

Proof If x e A, then there is a sequence of elements of A that converge to x, 
x n — »■ x (Proposition 3.4). By continuity of /, f(x n ) —*■ f(x), so f(x) e f A. 
It follows that if A = X then f X C f A fl fX. 

The following three propositions affirm that continuity is well-behaved with 
respect to composition and products, and that the distance function is continuous. 
They allow us to build up continuous functions from simpler ones. 

Proposition 3.9 


If f: X -*■ Y and g: Y Z are continuous, so is g o f : X -* Z. 


Proof Let x n — > x in X. Then by continuity of /, f(x n ) — > f(x) in Y, and by 
continuity of g. 


g ° f(x„) = g(f (x n )) -r g(f (x)) = g o f(x) in Z. 

Alternatively, let W be any open set in Z. Then <f 1 W is an open set in Y , and so 
f~ l g~ l W is an open set in X. But this set is precisely ( g o f)~ l W. □ 

Proposition 3.10 


The distance function d : X 2 R is continuous. 
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Proof Let x n -> x and y n — > y in A. Then, by the triangle inequality, 

I d(x n , y„) - d(x, y)| < \d{x n , y n ) - d{x , y„) \ + \d(x, y n ) - d{x, y)| 
< d(x n ,x) + d (y n , y) ->• 0, 

which gives d( x n , y n ) — >■ d(x , y) as n —> oo. 


Homeomorphisms 

Continuous functions preserve convergence, a central concept in metric spaces; in 
this sense, they correspond to homomorphisms of groups and rings, which preserve 
the group and ring operations. The analogue of an isomorphism is called a homeo- 
morphism: 

Definition 3.11 


A homeomorphism between metric spaces X and Y is a mapping J : X — > Y 
such that 

J is bijective (1-1 and onto), 

J is continuous, 

J- 1 is continuous. 

A metric space X is said to be embedded in another space Y, when there is a 
subset Z C Y such that X is homeomorphic to Z. 


Like all other isomorphisms, “A is homeomorphic 
to F” is an equivalence relation on metric spaces. 

When X and Y are homeomorphic, they are not only 
the same as sets (the bijection part) but also with 
respect to convergence: 

Xn -> x j(x n ) -> J(x ), 

and 

A is open in A O J A is open in F. 

The elements of F are those of A in different clothing, as far as convergence is 
concerned. The most vivid picture is that of “deforming” one space continuously 
and reversibly from the other. The by-now classic example is that a ‘teacup’ is 
homeomorphic to a ‘doughnut’ . 
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Exercises 3.12 


1. Any constant function /: x i— > yo e Y is continuous. The identity function 
I : X —> X, x i — > jc, is always continuous. 

2. The functions that map the real number x to x + 1, 2x, x n (n e N), a x (a > 0), 
and |x| are all continuous. 

3. In R, addition and multiplication are continuous, i.e., if x n — > x and y n — » ythen 
x n + Yn —• * x + y and x„y„ — > xy. Deduce that if /, g : X —> R are continuous 
functions, then so are f + g and fij. For example, the polynomials on R are 
continuous. The function max: R 2 — » Mis also continuous, i.e., max(x„, y n ) — > 
max(x, y). 


4. 

5. 

6 . 

7. 


The function / : ]0, oo[ — >■ ]0, oo[, defined by f(x) := 1 /x is continuous. 
Conjugation in C, z !->■ z, is continuous. 


In R, the characteristic function 1 a(t) = 


l x e A 


is always discontinuous 


0 x ^ A 

except when A — 0 or A — R. Is this true for all metric spaces? 

When / : X R is a continuous function, the set { x e X : /( x) > 0 } is open 
in X. 


8. Any function / : N — ► N is continuous. 

9. The graph of a continuous function /: X —> Y, namely { (x, f (x ) ) : x e X }, 
is closed ini x f (with the D\ metric). 

10. Find examples of continuous functions / (in X = Y = R) such that 

(a) f is invertible but f~ l is not continuous. 

(b) f(x n ) — > f(x) in Y but (x H ) does not converge at all. 

(c) U is open in X but fU is not open in Y. However functions which map 
open sets to open sets do exist (find one) and are called open mappings. 

11. If F is a closed set in Y and /: X — »■ Y is a continuous function, then f~ l F 
is closed in X. But / may map a closed set to a non-closed set (even if / is an 
open mapping!). 

12. It is not enough that f(x,y ) is continuous in x and y separately in order that / 
be continuous. For example, show that the function 


f{x,y):=^^, f (0, 0) := 0, 
x- + y- 

is discontinuous at (0, 0) even though f(x n , 0) -> 0. /((), y„) — > 0, when x„ 
0, v„ — > 0. It needs to be “jointly continuous” in the sense that f(x„ , y n ) 
f{x, y) for any (x„, y n ) (x, y). 
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13. The function f(x) := p ( ~Jx) on the domain R + , where p is a polynomial, is 
continuous. 

14. The roots of a quadratic equation ax 2 + bx + c — 0 vary continuously as the 
coefficients change (but maintaining b 2 R 4ac), except at a = 0. 

15. ►Use the continuity of d to find a short proof that the sphere S r := {y : d(x , y) = r] 
is closed. 

16. Given a set A C X, the map x i->- d(x, A) is continuous. (Hint: d(y, A) ^ 
d(y, x) + d(x. A).) 

17. Given disjoint non-empty closed subsets A, B C X, find a continuous function 
/: X —>■ [0, 1] such that /A = 0, f B = 1 (Hint: use d(. x, A) and d{x, B)). 

18. Every interval in M is homeomorphic to [0, 1], [0, oo[, or R. 

19. N is homeomorphic to the discrete metric space on a countable set, but Q is 
not. (Hint: The convergent sequence 1/n —> 0 must correspond to a divergent 
sequence in N.) 

20. O A bent line in the plane, consisting of two straight line segments meeting at 
their ends, is homeomorphic to the unbent line. Thus angles are meaningless 
as far as homeomorphisms are concerned; triangles, squares and circles are 
homeomorphic. 


Chapter 4 

Completeness and Separability 


4.1 Completeness 

Our task of rigorously defining convergence in a general space has been achieved, 
but there seems to be something circular about it, because convergence is defined 
in terms of a limit. For example, take a convergent sequence x n — > i in a metric 
space X, and “artificially” remove the point x to form X \je (assume V«, x n ^ x). 
The other points x„ still form a sequence in this subspace, but it no longer converges 
(otherwise it would have converged to two points in X ) — its limit is “missing”. The 
sequence (x „ ) is convergent in X but divergent in X\x. How are we to know whether 
a metric space has “missing” points? And if it has, is it possible to create them when 
the bigger space X is unknown? 

To be more concrete, let us take a look at the rational numbers: consider the 
sequences (1, 2, 3, . . .), (1, -1, 1, -1, . . .), and (1, 1.5, 1.417, 1.414, 1.414, . . .), 
the last one defined iteratively by ao := 1, a n+ \ := % + — . It is easy to show 
that the first two do not converge, but, contrary to appearances, neither does the 
third, the reason being that were it to converge to a e Q, then a = a/2 + 1/a, 
implying a 2 = 2, which we know cannot be satisfied by any rational number. This 
sequence seems a good candidate of one which converges to a “missing” number not 
found in Q. Having found one missing point, there are an infinite number of them: 
(2, 2.5, 2.417, 2.414, . . .) and (2, 3, 2.834, 2.828, . . .) cannot converge in Q. 

But could it be that the first two sequences also converge to “missing” numbers? 
How are we to distinguish between sequences that “truly” diverge from those that 
converge to “missing” points? There is a property that characterizes intrinsic conver- 
gence: suppose that (x„) is divergent in the metric space T, but converges x„ — > a 
in a bigger space X. Then the points get close to each other (in Y ), 

dy(x n , x m ) = dx (x n , x m ) ^ dx(x n , a) + dx(a, x m ) — »• 0, as n, m — »■ oo. 


J. Muscat, Functional Analysis, DOI: 10. 1007/978-3-3 19-06728-5_4, 
© Springer International Publishing Switzerland 2014 


37 


38 


4 Completeness and Separability 


Definition 4.1 

A Cauchy sequence is one such that d(x n , x m ) — > 0 as n, m — > oo, that is, 
Ve > 0, BN, n,m ^ N d(x n ,x m ) < e. 

To clarify this idea further, we prove: 

Proposition 4.2 

Two sequences ( x n ), (y„) are defined to be asymptotic when d(x„, y„) — >■ 0 
as n —*■ oo. 

(i) Being asymptotic is an equivalence relation. 

(ii) For (x„) asymptotic to (y n ), 

(a) if (x„) is Cauchy then so is (y„), 

(b) if (x„) converges to x then so does (y n ). 

(iii) A sequence (x n ) is Cauchy if, and only if, every subsequence of (x „ ) 
is asymptotic to (x„). 


Proof (i) Let (x„) ~ (y n ) signify d(x n ,y n ) — >■ 0 as n -> oo. Reflexivity and 
symmetry of ~ are obvious. If (x„) ~ (y„) ~ (z n ) then transitivity holds: 

d(x n , Zn) f d(x„, y n ) + d(y„,Zn ) -> 0 as n -» oo. 

(ii) If d(x n , y n ) —*■ 0 and d (x „ , x m ) — >• 0 as n, m —> oo, then 


d(y n , ym) f d (y „ , x n ) -T d (x n . x m ) T d (x m , y»i ) ^ 0- 

Similarly, if d(x n ,x) — >■ 0 then d{y„,x) ^ d{y„,x n ) + d(x n , x) — > 0. 

(iii) A Cauchy sequence satisfies 

Ve > 0, BN, n,m ^ N =y d(x n ,x m ) < e. 

Given a subsequence (x„ ; ), its indices satisfy n, f i (by induction on i: n\ f 1, 
«2 > n\ ^ 1 so n 2 ^ 2, etc.). Thus 


i ^ N =>■ n,-, i ^ A =>■ d(x ni ,Xi) < e 


and (x ;!i , x, ) -* 0 as i 


oo. 
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Conversely, suppose (x„ ) is not Cauchy. Then 

3e > 0, Vi, 3 rii,mj f i, d(x ni ,x mi ) ^ e, 

from which we can create the subsequences (x ni , x „ 2 , . . .) and (x m ] , x m2 If 
both these subsequences were asymptotic to (x„) then there would exist an N such 
that i > N implies d(xi, x ni ) < e/2 as well as d (x,- , x mj ) < e/2. Combining these 
two then gives a contradiction 

d (x Bj , x m; ) < d (xi , x„ f ) + d (xi , x m; ) < e, 

so one of the two subsequences is not asymptotic to (x„). □ 

Examples 4.3 

1. Convergent sequences are always Cauchy, since if x„ — > x then d (x n , x m ) -> 
d(x,x) = 0 by continuity of the distance function. But the discussion above 
gives examples of Cauchy sequences which do not converge. 

2. In R or Q, any increasing sequence that is bounded above, a n ^ b, is Cauchy. 

Proof Split the interval [ao, b] into subintervals of length e. Let I be the last 
subinterval which contains a point, say on- As the sequence is increasing, I con- 
tains all of the sequence from N onward, proving the statement. 


3 . R and Q have the bisection property : Let [ao , bo ] 

be an interval in R or Q, and divide it into halves, ao | | b 0 

[ao, c] and [c, bo], where c := (a Q + b 0 )/2 a± \ 1 

is the midpoint. Choose [a\,b\] to be either « 2 | 1 f )2 

[ao, c] or[c, £>o] randomly or according to some 03 |— | & 3 

criterion; continue taking midpoints to get a H 

nested sequence of intervals [«„,£>„], whose H 

lengths are 


b n - a n = (bo - ao)/2" ->• 0. 

So, for any e > 0, there is an N > 0 such that Ly — asi < e, and for any n f N , 
a n , b n e [ajy, b^]. Hence ( a n ) and (b n ) are asymptotic Cauchy sequences. 

4. Let B, n be a nested sequence of balls (B,- n+l C // n j, with r„ — >• 0. Then choosing 
any points x„ e P>,-„ gives a Cauchy sequence. 

Proof For any m f n. 


x m e b, m c B rm l B, n 

so that d(x m , x w ) < 2 r n — > 0 as n, m — > 00. 
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5. A Cauchy sequence cannot stray too far in the sense that d(x o, x n ) C R for all 
n, for some R f 0. Hence Cauchy sequences are “bounded”. 

Proof By the definition of a Cauchy sequence for e := 1 say, there is an N such 
that n,m ^ N =4> d(x„, x m ) < e. Therefore 

t/(xo, x„ ) ^ t/(xo, xjv) + d(xiy, x n ) < d(x o, XAt) + e. 

6. A Cauchy sequence in Q either converges to 0, or is eventually greater than some 
€ > 0 or less than some —e < 0. In each case, an asymptotic sequence behaves 
in the same manner. 

Proof If a n -fr 0 yet is Cauchy, then 

Be > 0, VM, 3 m ^ M, \a m \ ^ e, 

BN, m,nfN \a n — a m \ < e/2. 

Assuming, for example, a m f e for some m f N, 

n f N ci fl f | cin o | > e/2. 

If ( b n ) is an asymptotic sequence, there is an M such that \a n — b„\ < e/2 
whenever n f M, and so 

n ^ max(A, M ) =*> b n > a n — \a n — b n \ ^ e/2. 

Complete Metric Spaces 
Definition 4.4 


A metric space is complete when every Cauchy sequence in it converges. 


In a complete metric space, there are no “missing” points and any divergent 
sequence is “truly” divergent — there is no bigger metric space which makes it con- 
vergent. 

It follows that the space of rational numbers Q (with the standard metric) is not 
complete, a fact that allegedly deeply troubled Pythagoras and his followers. They 
shouldn’t have worried because there is a way of creating the missing numbers (but 
skip the proof if it worries you on a first reading!): 

Theorem 4.5 


The real number space R is complete. 
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Proof (i) For this to be a theorem, we need to be clear about what constitutes R. The 
usual definition is that it is a set with an addition + and multiplication • which satisfy 
the axioms of a field (see p. 9), and with a linear order relation R that is compatible 
with these operations: 

x^y=>x + z^y + z, x, y ^ 0 =>■ xy ^ 0, 

and in addition satisfies the completeness axiom: 

Every non-empty subset A of R with an upper bound has a least upper bound. 

Assuming all these axioms, let (a„) be a Cauchy sequence in R, that is, for any c > 0, 
there is an N beyond which \a n — a m \ < e. Let 

B := {x e 1 : 3 M, n ^ M => x R a n }. 


Its elements might be called eventual lower bounds of {a n : n £ N}. The fact that 
Cauchy sequences are bounded implies that {a n : n e N} has a lower bound and so 
B 7^ 0, while any upper bound of {«„ : n <= N) is also one of B. Hence, by the 
completeness axiom, B has a least upper bound a. Two facts follow, 

(a) a + e is not an element of B , so there must be an infinite number of terms 
a ni < a + e; 

(b) a — e is not an upper bound of B. so there must exist an x e B and an M such 
that n ^ M =y a — e < x R a n . 

These facts together imply that for /;,■ R M we have « — e R a, u R <x + e. Then 
n > max(M, N) =y \a„ — a\ R \a„ — a n{ \ + \a, u — a\ < 2e 
as required to show a n — > a. 

This proof is open to the criticism that we have not proved whether, in fact, there 
exists such a set with all these properties. We need to fill this logical gap by giving 
a construction of JR that satisfies these axioms. 

(ii) The whole idea is to treat the Cauchy sequences of rational numbers themselves 
as the missing numbers! How can a sequence be a number ? Actually, this is not 
really that novel — the familiar decimal representation of a real number is a particular 
Cauchy sequence: e := 2.71828 . . . is just short for (2, 2.7, 2.71, 2.718, . . .). There is 
of course nothing special about the decimal system — the binary expansion (2, 2 1 , 2+ 
^ + g, . . .), along with several other Cauchy sequences, also converges toe. We should 
be grouping these asymptotic Cauchy sequences together, and treat each class as one 
real number. For example, the asymptotic sequences 0.32999 . . . and 0.33000 . . . 
represent the same real number. 

Accordingly, R is defined as the set of equivalence classes of asymptotic Cauchy 
sequences of rational numbers; each real number is here written as x — [a n ] (instead 
of the cumbersome [(«„)]). We now develop the structure of R: addition and multi- 
plication, its order and distance function. Define 
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X + y = [a n ] + [ b n ] := [ a n + b„], xy = [a n ][b n ] := [a n b n ]- 

That addition is well-defined follows from an application of the triangle inequality in 
Q; that it has the associative and commutative properties follows from the analogous 
properties for addition of rational numbers. The new real zero is [0, 0, . . .], and the 
negatives are — x = — [a n ] = [—«„]. Similarly, multiplication is well defined and 
has all the properties that make R a field. 

It is less straightforward to define an inequality relation on R. Let (a„ ) > 0 mean 
that the Cauchy sequence ( a n ) is eventually strictly positive Example 4.3(6), 

3e e Q + , 31V , n ^ N => a n ^ e > 0. 

Any other asymptotic Cauchy sequence must also eventually be strictly positive. 
Correspondingly, let x < y mean that y — x > 0, or equivalently, 

[a n ] < [b n ] O e Q + , 3 N, Vn ^ N, a n + e < b n - 

This immediately shows that x < y O x + z < y + Z- We make a few more 
observations about this relation: 

1 . if a n ^ 0 for all /;, then [a n ] ^ 0, 

2. if 0 < x and 0 < y then 0 < xy and 0 < jc + y (gives transitivity of ^), 

3. x > 0 OR i = 0 OR x < 0 (Example 4.3(6)). 

4. if x < 0 then — x > 0. 

Anti-symmetry of follows from the fact that (b n — «„) cannot eventually be both 
strictly positive and strictly negative. This makes R a linearly ordered field. 

Given a real number x = [a n ] = [b n ], let |jc| := [|a„|], which makes sense since 

||a„| — |a m || ^ \a n — a m \ — >■ 0 as n, in — >■ oo, 

| \a„ | — \b„ 1 1 ^ | a n — b n \ 0 as n —*■ oo. 

In fact \x\ = x when x > 0 and \x\ = — x when x < 0, so it satisfies the properties 
\x\ 0, \x\ = 0 x — 0, | — jc | = | jc | , and \x + y| ^ \x\ + |y|. Thus d(x, y) := 

\x — y| is a distance, as in Example 2.2(1). 

Q is dense in R: Note that a rational number a can be represented in R by the con- 
stant sequence [a, a , . . .]. The Archimedean property holds since [a n ] >0 implies 
that eventually a n ^ p > 0, Bp e Q, so [ a n ] ^ [p/ 2] > 0. Also, if x — [ a n ] then 
a n — > x in R, since for any e > 0, let p e Q, 0 < p < e, so 

3 A, n , m N =>■ \a n — a m \ < p 

=>■ d (a n , x ) = d([a n ,a n , . . .], [a\, d 2 , . . .]) 

= [K - Oil. I On - ai\, ■■■] 


< e. 
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The completeness axiom is satisfied: Let A be any non-empty subset of R that is 
bounded above. Split R into the set B of upper bounds of A, and its complement B c , 
both of which are non-empty, say oq g B c , ho G B \ these can even be taken to be 
rational, by the Archimedean property. 


A a 

— > — H 

B c B 

Divide [ao, bo] in two using the midpoint c := (a o + bo)/2\ if c e B then select 
[«i, /j] ] = [«() , c], otherwise take [a\ , b{\ = [c, bo]. Continue dividing and selecting 
sub-intervals like this, to get two asymptotic Cauchy sequences (a„), (/;„), with 
b n e B, a n G B c . Let a := [a n ], so a n -» a, b n — >■ a, and (Exercise 3.5(4)) 

Vo e A, a R b„ => Vo e A, a R O', a is an upperbound of A, 

Wb e B, a n R b =?■ Vb e B, a R b, a is the least upperbound. 

A dual argument shows that every non-empty set with a lower bound has a greatest 
lower bound, denoted inf A. 

R is complete: This now follows from part (i), but we can see this directly in this 
context. Start with any Cauchy sequence of real numbers (in decimal form, say) and 
replace each number by a rational number to an increasing number of significant 
places, for example: 


x n G R i — y a n G (Q) 

2.6280 ... 2 

2.7087 .. . 2.7 

2.7173.. . 2.71 

2.7181.. . 2.718 


The crucial point is that the two sequences are asymptotic by construction. Since the 
first one is Cauchy, so must be the second one. But a Cauchy sequence of rational 
numbers is, by definition, a real number x. Moreover, a n — ► x implies x n — > x . □ 

This “completion” process generalizes readily to any metric space. 

Theorem 4.6 


Every metric space X can be completed, that is, there is a complete met- 
ric space X, containing (a dense copy of) X and extending its distance 
function. 


Any such complete metric space X is called the completion of X. 
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Proof Construction of X: Let C be the set of Cauchy sequences of X. For any 
two Cauchy sequences a = (x„), ft = (y„), the real sequence d(x n ,y n ) is also 
Cauchy (Exercise 4.10(6)), and since R. is complete, it converges to a real number 
D(a, b) d(x n , y n ) . Symmetry and the triangle inequality of D follow 

from that of d, by taking the limit n — > oo in the following: 


d(y„,x n ) = d(x n , y n ) 

d(x n , y n ) < d (x n , Zn) + d(z „ , y„) 


D(b , a) = D{a, b) 
D(a, b) < D(a, c) - 


D(c, b). 


The only problem is that D(a,b) = 0, meaning d (x n , y n ) — > 0, is perfectly possi- 
ble without a = b. It happens when the Cauchy sequences (x n ) , (y„) are asymptotic. 
We have already seen that this is an equivalence relation, so C partitions into equiv- 
alence classes. Write d([a], [ft]) := D(a, ft); it is well-defined since for any other 
representative sequences a' e [a] and b' e [ft], we have 


D{a ' , ft') ^ D{a ' , a) + D(a, ft) + L>(ft, b') = D(a , ft); 


similarly D(a,b) ^ D(a',b')\ so Z)(a,ft) = D(a',b'). Let X be this space of 
equivalence classes of Cauchy sequences, with the metric d. 

There is a dense copy of X in X : For any x e X, there corresponds the constant 
sequence x := (x, x, . . .) in C. Since 


</([*], [j]) = D((x), 00) = lim d(x, y) = d(x , y), 

n — > oo 


this set of constant sequences is a true copy of X , preserving distances between points. 
To show that this copy is dense in X, we need to show that any representative Cauchy 
sequence a = (x„ ) in C has constant sequences arbitrarily close to it. By the definition 
ofCauchy sequences, for any e > 0, there is an /V e N with d(x n , x.y) < efor n f N. 
Letx be the constant sequence (x,v). Then D(a. x ) = lim„^oo d(x n , xn) f c < 2c 
proves that [x] is within 2 e of [a]. 

X is complete: Let ([a,,]) be a Cauchy sequence in X ; this means c/([a„], [a ;; ,]) = 
D(a„, a m ) — > 0, as n, m — > oo. For each n, we can find a constant sequence x n 
which is as close to a„ as needed, i.e., D(x n , a n ) < e„; by choosing e n — > 0, we 
can select (jc„) to be asymptotic to (a,,)- As (a„) is Cauchy, so is (x„). In fact, 
x n x := (x„) since 


lim D(x n ,x)= lim d(x n ,x m ) = 0, 

n—>oo m,n—>oo 

so that the asymptotic sequence a n also converges to x, and [a„] to [x]. □ 

Proving that a given metric space is complete is normally quite hard: Even showing 
that a particular Cauchy sequence converges may not be an easy matter because one 
has to identify which point it converges to, let alone doing this for arbitrary Cauchy 
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sequences. But once a space is shown to be complete, one need not go through the 
same proof process to show that a subspace or a product is complete: 

Proposition 4.7 

Let X, Y be complete metric spaces. Then, 

(i) A subset F c X is complete <$■ F is closed in X, 

(ii) X x Y is complete. 


Proof (i) Let F c X be complete, i.e., any Cauchy sequence in F converges to a 
limit in F. Let x e F, with a sequence x„ —*■ x, x n e F (Proposition 3.4). Since 
convergent sequences are Cauchy and F is complete, x must be in F. Thus F = F 
is closed. The completeness of X has not been used, so in fact a complete subspace 
of any metric space is closed. 

Conversely, let F be a closed set in X and let (x n ) be a Cauchy sequence in F. 
Then (x„) is a Cauchy sequence in X, which is complete. Therefore x n — > x for 
some x e X. In fact, x e F = F . Thus any Cauchy sequence of F converges in F . 


(ii) Let 


(;i) 


be a Cauchy sequence in X x Y . Recall that 

= d x (x n ,x m ) + d Y (y n , }’m ) > d x (x n ,x m ). 


(C:)'Ci:)) ; = 


Since the left-hand sequence converges to 0 as n, m — > oo, we get d x (x n , x m ) — >■ 0, 
so that the sequence (x n ) is Cauchy in the complete space X. It therefore con- 
verges x n -* x e X. By similar reasoning, y„ y e Y. Consequently, 


(OO) 

(:)-(;)■ 


= d x (x n ,x) + dy(y, 1 , y) ->■ o as n -> 


lent to 


in X x Y. 


oo, which is equiva- 
□ 


Examples 4.8 

1. The completion of a subset A in a complete metric space A is A. 

Proof The completion Y of A must satisfy two criteria: Y must be complete, and 
A must be dense in Y. Now, A is closed in X, so is complete, and A is dense in 
A (by definition). 

2. Two metric spaces may be homeomorphic yet one space be complete and the 
other not. For example, M is homeomorphic to ]0, 1[ (Exercise 3.12(18)), but the 
latter is not closed in R. 
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3. Let / : X — > Y be a continuous function. If it can be extended to the completions 
as a continuous function f:X—>Y, then this extension is unique. 

Proof Any x e X has a sequence (a n ) in X converging to it (Proposition 3.4). 
As / is continuous, we find that f(x) is uniquely determined by 

f(x) = lim /(«„) = lim f(a n ). 

n— >oo ft— >oo 

4. But not every continuous function / : X — ► Y can be extended continuously to 
the completions / : X — > Y. For example, the continuous function f(x) := 1 / x 
on ]0, oo[ cannot be extended continuously to [0, oo[. 

5. (Cantor) The completion of Q to R has come at a price: R is not countable. Prove 
this by taking the binary expansion of a list of real numbers in [0, 1], arranged in 
an infinite array, and creating a new number from the diagonal that is different 
from all of them. The next theorem is a strong generalization of this statement. 

Theorem 4.9 Baire’s category theorem 


A complete metric space cannot be covered by a countable number of 
nowhere-dense sets. 


Proof Suppose that the metric space X = U;^=i A„, where A„ are nowhere dense. 
We are going to create a nested sequence of balls whose centers form a non- 
convergent Cauchy sequence, as follows: To start with, A i f A so its exterior 
contains a ball B n (x\) C (A \ ) c . Now A 2 contains no balls, so the open set 
(A 2 ) c fl B n (x 1 ) is non-empty and there is a ball B n (xi) C (Ai) 0 (T B n (x ] ). 

Continuing like this, we can find a sequence of 
points (using the Axiom of Choice) 

Xn - fl £ Br n _ |_i(-Vfi-|-l) “Az. (A, 1+ | ) n B rn (x n ) . 

Moreover at each stage, r„ can be chosen small 
enough that 

r n -» 0 (e.g.r„ < 1/n), 

Br, t+l [x n+ i] c B rn (Xn) (e.g. r n+ i < r n - d(x n , x„+i)). 

Thus ( x n ) is a Cauchy sequence (Example 4.3(4)). 

Now suppose that x n — »■ x. For all m > n we have x m e B,- ll+1 (x n+ \ ) and taking 
the limit x m -^rwe find x e B rn+l [v„+i] c B, n (x n ). Since this holds for any n we 
obtain 
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Rene-Louis Baire (1874-1932), after graduating from Paris 
around 1894, tackled the problem of convergence and limits of 
functions, namely that no space of functions then known was 
“closed” under pointwise convergence. Progress on this issue 
was made by his colleague Borel in the direction of measurable 
sets. 

Fig. 4.1 Baire 

xef] B rn {x„) c P(A„) C = ( U A„) C c ( |J A„) C = A c = 0 

n n n n 

a contradiction. Having constructed a non-convergent Cauchy sequence, X must be 
incomplete. □ 

Exercises 4.10 

1. Any sequence in Q of the type (3.1, 3.14, 3.141, 3.1415, . . .) is Cauchy. 

2. The sequences (1, 2, 3, . . .) and (1, — 1, 1, — 1, . . .) are not Cauchy. 

3. * Try to prove that the sequence defined by ciq := 1 • a n+ 1 •= a 9 " + J is Cauchy. 

4. If a sequence ( x n ), chosen from a finite set of points, e.g. (x, y, x, x, y , . . .), is 
Cauchy then it must eventually repeat (xo, . . . , x,v, x,y , . . .). (Hint: Generalize 
Exercise 2.) 

5. ► If d(x n + 1 , x„) < ac n with c < 1 then x„ is Cauchy. But a sequence which 
decreases at the rate d(x n + i,x„) ^ l/n need not be Cauchy. For example, 
use the principle of induction to show that the example in Exercise 3 satisfies 
\a „+ 1 -a„ | < {\) n+X - 

The following give sufficient conditions for Cauchy sequences: 

(a) If d(x„+i, x n ) ^ cd(x n , x n -i) with c < 1, then d(x n , x m ) ^ ac n , 

(b) If r/(x„+i, x„) ^ cd(x„,x„- 1 ) 2 with cd(xi, xo) < 1 then d(x n ,x m ) ^ 
c~ l b 2 ", 

for n $1 m and appropriate constants a, b. 

6. If (x H ), ( y n ) are Cauchy sequences in X, then so is d„ := d(x n , y n ) in R. 

7. ► A continuous function need not map Cauchy sequences to Cauchy sequences. 

8. If x„ -» x and y n — »■ x, then (x„), ( y n ) are asymptotic. 

9. yJTi and ~Jn + 1 are asymptotic divergent sequences in M. 

10. ► A subsequence of a Cauchy sequence is itself Cauchy, and if it converges so 
does its parent sequence. 
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11. If (x„ ) is a Cauchy sequence, and the set of values {x n } has a limit point x, then 

X„ ->• X. 

12. The completion of ]0, l[andof[0, 1 [ is [0, 1]. Any Cauchy sequence in the Can- 
tor set C must converge in C. However a Cauchy sequence of rational numbers 
need not converge to a rational number because Q is not closed in R. 

13. ► := 1 x • • ■ x 1 and C are complete. 

14. Is N complete? Any discrete metric space is complete. 

15. (Cantor) We have already seen that the centers of a nested sequence of balls with 
r n — > 0 form a Cauchy sequence (Example 4.3(4)). Show, furthermore, that in 
a complete metric space, p| n B rn [x n \ = {lim,,-^ x„}. 

16. The only functions / : Q — > Q satisfying f(x + y) = f(x) + f(y ) are / : x i— >- 
kx . Deduce that the only continuous functions / : R. — > R. with this property 
are of the same type. 

17. * The completion of X is essentially unique, in the sense that any two such 
completions (such as the one defined in the theorem) are homeomorphic to each 
other. 

18. The Cantor set is complete and nowhere dense in R; why doesn’t this contradict 
Baire’s theorem? 

4.2 Uniformly Continuous Maps 

We have seen that a continuous function need not preserve completeness, or even 

Cauchy sequences. If one analyzes the root of the problem, one finds that its resolution 

lies in the following strengthening of continuity: 

Definition 4.11 


A function / : X — > Y is said to be uniformly continuous when 
Ve > 0, 35 > 0, V.v e X, fB s (x ) c B,(f(x)). 


The difference from continuity is that, here, 8 is independent of x. 

Easy Consequences 

1 . Uniformly continuous functions are continuous. 

2. But not every continuous map is uniformly so; an example is f(x) := 1 /x on 
]0,oo[. 
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3. ► The composition of uniformly continuous maps is again uniformly continuous. 

Proof We >0, 38,8' >0, Wx, g(f(B s (x))) c g(B s >(f(x))) c B € (g(f(x))). 

The key properties of uniformly continuous maps are the following two proposi- 
tions: 

Proposition 4.12 


A uniformly continuous function maps any Cauchy sequence to a Cauchy 
sequence. 


Proof By definition / : X -* Y is uniformly continuous when 

We >0, 35 > 0, Wx,x', dx(x,x') < 8 =>■ dy{f(x), f(x')) < e. 

In particular, for a Cauchy sequence (x n ) in X, with this 8, 

3 N, n,m > N =$■ dx(x n , x m ) < 8 

t/y ( j (.\y ) , / (x fn ) ) < e, 

proving that (f(x n )) is a Cauchy sequence in Y. □ 

More generally, the same proof shows that a function / : X —*■ Y is uniformly 
continuous if, and only if, it maps any asymptotic sequences ( a n ), (b„) in X to 
asymptotic sequences ( f(a n )), (/(£>„)) in Y. 

Theorem 4.13 


Every uniformly continuous function / : X — > Y has a unique uniformly 
continuous extension to the completions f: X -*■ Y. 


Proof In order not to complicate matters unnecessarily, let us suppose that X and 
Y are dense subsets of X and Y respectively, instead of being embedded in them. 
Nothing is lost this way, except quite a few extra symbols ! 

Let x n — > x e X, with x n e X. The sequence / (x„ ) is Cauchy in Y by the 
previous proposition, so must converge to some element y e Y. Furthermore, if 
a n -> x as well ( a n e X), then ( x n ) and ( a n ) are asymptotic (Example 4.10(8)) 
forcing f(x n ) and f(a„) to be asymptotic in Y, hence f (a n ) — >■ y. This allows us 
to define / (x) := y without ambiguity. Moreover, this choice is imperative and / is 
unique, if it is to be continuous. 
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The uniform continuity of / follows from that of /. For any e > 0, there is a 
8 > 0 for which 


Va, b e X, dia, b) < 8 =>■ f(b )) < e. 

Let x, x' e X with dix, x') < 8, let a n — »■ x, £>„ — »■ x' with a n , e X and, by the 
above, f(a n ) — > /(x), f(b„) -* f(x'). Among these terms, we can find a close to 
x and b close to y to within r := (5 — d(x, x'))/2 < 8, while also f(ci) is close to 
f{x) and f{b) is close to fix') to within e. Then 

dia , b) ^ dia , x) + dix, x') + dix' , b) < 2r + dix, x r ) = 8 
=► d(/w, /V)) < rf(/w, fia))+difia), fib)) + difib), fix')) < 36- 


□ 

The following are easily shown to be uniformly continuous functions: 

Definition 4.14 

A function / : X -> Y is called a Lipschitz map when 

3 c > 0, Vx, x' e X, dyif (x), fix')) ^ cdxix, x'). 

Furthermore, it is called 

an equivalence (or bi-Lipschitz) when / is bijective and both / and f~ l are 
Lipschitz, 

a contraction when it is Lipschitz with constant c < 1 , 

an isometry, and X, Y are said to be isometric, when / preserves distances, 
i.e., 

Vx, x' e X, dyif (x), fix’)) — dxix, x'). 

Examples 4.15 

1. Any / : [a , b] — »■ R with continuous derivative is Lipschitz. 

Proof As f is continuous, it is bounded on [a, b], say |/ , (x)| ^ c. The result 
then follows from the mean value theorem, 

fix) - fix') = fmx - x'), 3$ e ]0, 1[. 

2. To show /: M 2 —> M 2 is Lipschitz, where / = (/i, ff), it is enough to show 
that 
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\fiixu yi) - fix 2 , yi)\ < c(|xi - x 2 | + \yi - _V 2 1 ) , i = 1,2 

for then (using ( a + b ) 2 ^ 2 (a 2 + b 2 ) for a, b e K) 

11 (h%ln) htl S) 11 < l/l(X1 ’ yi) “ /l( * 2 ’ y2)l + lMxun) ~ MX2 ' - V2)l 

^ 2c(|xi - x 2 \ + |yi - y 2 \) 

3. ► Lipschitz maps are uniformly continuous, since for any e > 0, we can let 
8 := e/2c independent of x to obtain d(x, x’)<8^ difix), f(x')) ^c8<e. 

4. But not every uniformly continuous function is Lipschitz. For example, *Jx on 
[0, 1] is uniformly continuous (show!); were it also Lipschitz, it would satisfy 
\s/x - V0| ^ c \x 0 1 which leads to ^J~x ^ 1 J c . 

The next theorem is one of the important unifying principles of mathematics. It 
has applications in such disparate fields as differential equations, numerical analysis, 
and fractals. 

Theorem 4.16 The Banach fixed point theorem 


Let I / 0 be a complete metric space. Then every contraction map 
/ : X — > X has a unique fixed point x = fix), and the iteration x n+ \ := 
fix n ) converges to it for any xq. 


Proof Consider the iteration x„+i := fix n ) starting with any xo in X. Note that 
d(x n+ l,x„) = difix n ), fix n - 1)) < C rf(x„ , X„_l). 

Hence, by induction on n. 


dix n + i,x„) ^ c n dix i,xo), 

so (x„) is Cauchy since c < 1 (Exercise 4.10(5)). As X is complete, x„ converges 
to, say, x, and by continuity of /, 


fix) = fi lim x„) = lira /(x„) = lim x„+i = x. 

n — »oo n— >oo n-+o o 


r n 

Moreover, the rate of convergence is given at least by r/(x, x„) fi d(xi , xo). 
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4 Completeness and Separability 


Suppose there are two fixed points x = f (x) 
and v = / ( v) ; then 

d(x, y ) = d( f (x), f(y)) < cd(x, y) 

implying d(x, y) = 0 since c < 1. 

□ 


Exercises 4.17 

1. Show that 

(a) / : [a, b] — > R, f{x) := x + 1/x, is a contraction when a > 2 _ z ; 

(b) /: [0, l] 2 -* R 2 , f{x, y) := j isLipschitz. 

2. The composition of two Lipschitz maps is Lipschitz. 

3. ► A Lipschitz map (with constant c) sends the ball B r (a ) into the ball B cr (f(a)). 

4. Isometries are necessarily 1-1. Onto isometric maps are equivalences, and the 
latter are homeomorphisms. 

5. ► Two metric spaces are said to be equivalent when there is an equivalence 
map between them. Equivalent metric spaces must be both complete or both 
incomplete. 

6. ► If a space has two distances, the inequality d\(x,y) ^ c dzix, y), where 
c > 0, states that the identity map is Lipschitz. In the same vein, two distances 
are equivalent when there are c,c' > 0 such that c (L (x . y) sC d\(x,y) sC 
cd 2 (x, y). Show that two equivalent distances have exactly the same Cauchy 
sequences. 

7. The unit circle has two natural distance functions, (i) the arclength 9 and (ii) the 
Euclidean distance 2sin(0/2), where 6 is the angle between two points (^7r). 
Prove that the two are equivalent by first showing 

29/ tv ^ sin 9 ^ 6, 0 ^ 9 ^ n/2. 

8. The distances D\ and Z>oo for X x Y (Example 2.2(6)) are equivalent. 

9. The fixed point theorem can be generalized to the case when /: B, [xo] —*■ X 
is a contraction map, as long as the starting point satisfies d(xo, xi) < (1 — c)r. 
Use the triangle inequality to show that x„ remain in B r [x o]. 

10. The classic example of an iteration converging to a fixed point is that provided 
by a continuously differentiable function /: R — ► R with \f'{x)\ < 1; it is a 
contraction map in a neighborhood of x. 
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11. If / : R — »■ R is a contraction with Lipschitz constant c < 1, then fix) = x can 
also be solved by iterating x n +\ := F(x n ) where F(x ) := x — a(x — fix)), 
0<a<2/(c+l). Hence find an approximate solution of x = sinx + 1 near 
to x = 7r ; experiment by choosing different values of a and compare with the 
iteration x„+i := /(x„). 


4.3 Separable Spaces 


Completeness is a “nice” property that a metric can have. A different type of property 
of a metric space is whether it is, in a sense, “computable” or “constructive”. Starting 
from the simplest, and speaking non-technically, we find: 


Finite metric spaces 
Countable metric spaces 


Separable metric spaces 


Non-separable metric spaces 


There are a finite number of possible distances to 
compute. 

With an infinite number of points, an algorithm may 
still calculate distances precisely, but it may take 
longer and longer to do so. 

Points can be approximated by one of a countable 
number of points; any distance can be evaluated, not 
precisely, but to any accuracy. 

There may be no algorithm that finds the distance 
between two generic points, even approximately. 


Non-separable metric spaces are, in a sense, too large, while countable metric 
spaces leave out most spaces of interest. 


Definition 4.18 


A metric space is separable when it contains a countable dense subset, 
3A C X, Acountable AND A = X. 


Examples 4.19 

1. Countable metric spaces, such as N, Z, Q, are obviously separable. 

2. ► R is separable because the countable subset Q is dense in it. By the next 
proposition, C and R w are also separable. 1 


1 There is a catch here: The metric used in the proposition is not the Euclidean one. But 

the inequalities used there remain valid for the Euclidean metric, f dx(a n , x)~ + dyib n , y ) 2 < 


ye 2 /4 + e 2 /4 < e. 
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4 Completeness and Separability 


Proposition 4.20 


Any subset of a separable metric space is separable. 

The product of two separable spaces is separable. 

The image of a separable space under a continuous map is separable. 


Proof (i) Let Y C X and A = X, with A = [a n : n e N} countable. For each 
a n , let Y nm {y e Y : cl (a,,, y ) < 1 /m}, and pick a representative point from 
each. y n m e Y nm , whenever the set is non-empty. This array of points is certainly 
countable, and we now show that it is dense in Y. 

Fix 0 < e < 5; any y e Y can be approximated by some a n e A with 
d(a n , y) < e. Pick the smallest integer m such that m > l/2e; then m — 1 ^ 1 /2e, 
so m f 1 /f ; therefore e ^ 1 /m < 2e. Then y e Y n m f 0, so that there must be a 
representative y n ,m with d (a n , y njn ) < 1 /m < 2e. Combining the two inequalities, 
we get 

d(y n , m , y ) < d(y n , m ,a n ) +d(a„,y) < 3e. 


(ii) Let {ai, A2, ■ ■ ■} be dense in X, and {bi, £>2, • ■ ■} dense in Y. Then for any e > 0 
and any pair ^ ^ e XxY.x can be approximated by some a„ such that dx ( a n , x) < 
e/2, and y by some b m with dy (b m , y) < e/2; then 


d 



d x (a n ,x) + d Y (b m ,y) < e 


shows that the countable set of points 



(n, m e N) is dense in A x Y . 


(iii) Let / : X — > Y be continuous and let A be countable and dense in X. Then /A 
is countable because the number of elements of a set cannot increase by a mapping. 
Moreover, as / is continuous,/^ is dense in fX (Example 3.8(3)), and fX is separable. 

□ 


Exercises 4.21 

1 . A metric space X is separable when there is a countable number of points a n such 
that the set of balls B € (a n ) covers X for any e. 

2. * In a separable space, we can do with a countable number of balls (with say 
rational radii), in the sense that every open set is a countable union of some of 
these. It then follows that every cover of the space using open sets has a countable 
subcover. 

3. The union of a (countable) list of separable subsets is separable. 
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4. ► If there are an uncountable number of disjoint balls, then the space is non- 
separable, e.g. an uncountable set with the discrete metric is non-separable. We 
shall meet some non-trivial examples of non-separable metric spaces later on 
(Theorem 9.1). 

Remarks 4.22 

1 . * The proof of Baire’s theorem can be modified to show that the countable union 
of closed nowhere dense sets in a complete metric space X is nowhere dense in 

X. 

2. Note that f(y)) < d(x,y) does not necessarily give a contraction map. 

For example, /(x) := 2/(Vx 2 + 4— x). In this case, the iteration .r„ +1 := f(x„) 
may satisfy d(x„+ 1 , x„ ) — ► 0 but need not be a Cauchy sequence. 

3. The reader has most probably seen images of 
fractals; many of these are the fixed ‘point’, 
or attractor , of a contraction on the space of 
shapes (Example 2.2(4)) (see [19]). 


4. The Banach fixed point theorem is also valid when f N f o ■ ■ ■ o /, rather 
than /, is a contraction map; in this case the convergence is “cyclic”. 


v Vw' 


Chapter 5 

Connectedness 


5.1 Connected Sets 

We have an intuitive notion of what it means for a shape to be in one piece. The 
following definition makes this idea precise: 

Definition 5.1 


A subset C of a metric space is disconnected when it can be divided into (at 
least) two disjoint non-empty subsets C = A U B such that each subset is 
covered exclusively by an open set, i.e., 


A C. U, B fl U = 0, U open, 
B c V, A fl V = 0, V open. 

Otherwise a set is called connected. 



Examples 5.2 

1 . Single points are always connected because they cannot be split into two non- 
empty sets. Similarly the empty set is connected. 

2. ► Any subset of Z (or any discrete metric space) is disconnected except the 
single points and the empty set. Metric spaces with this property are called totally 
disconnected. 

Proof Let C contain more than one point, say a and b. Take A = U := { a } 
and B = V := C\{ a ) f <Z>. Then U and V are open (any subset is open) and 
respectively contain A and B exclusively. 


J. Muscat, Functional Analysis, DOI: 10. 1007/978-3-3 19-06728-5_5, 
© Springer International Publishing Switzerland 2014 
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5 Connectedness 



Kazimierz Kuratowski (1896-1980 Poland) rewrote much of 
Hausdorff’s theory in 1921, introducing his closure axioms and 
connectedness. Similarly Aleksandrov and Urysohn, and later 
Tykhonov, in Moscow, built upon Hausdorff’s work with com- 
pactness. 


Fig. 5.1 Kuratowski 


3. ► A set A is connected when every continuous function / : A -*■ { 0, 1 } C Z is 
constant. Otherwise the open sets f~ l { 0 } and / -1 { 1 } cover and disconnect A. 

Proposition 5.3 


A set C is connected every non-trivial subset of C has a non-empty 
boundary in C that is, 

0 / Ac C =>■ 3c A yA 0 . 


Here, dc A = { x e C : We > 0,3a e A C\ C ,3b e C\A, d(a,x) < e, 
d{b, x) < e }. 

Proof Let 0 A C C be without a boundary in C. Then all the points of C are 
either interior points or exterior points of A; thus A and B := C \ A are open in 
C. But then there are open sets U, V in X, with A = U fl C and B = V fl C 
(Example 2.12(3)), and 

u n b = u n (C\A) = f/ncnA c = 0 

(similarly V fl A = 0), so C = A U (C\A) = A U B is disconnected. 

Conversely, if C is disconnected, then C = A U B, with A C U, B c V, both 
non-empty, and U, V open sets in X with A(1V = 0 = B P\U . For any point a e A, 
a e B, (a) c U\ hence 

a e { x e C : d(x , a) < r } = B r (a) flCc[/nC = A 

shows that A is open in C. Similarly B = C\A is open in C, hence A is closed in 
C. This leaves A without a boundary in C. □ 
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Theorem 5.4 


The connected subsets of R are precisely the intervals. 


Proof Every non-trivial subset of an interval / Cl has a boundary point'. Let A 
be a non-trivial subset of /; that A is non-trivial means that there exist ao e A and 
bo e I\A. We can assume ao < bo, otherwise switch the roles of A and /\ A in 
what follows. 

Divide the interval [ao, bo] into halves, [<7o, c] 
and [c, bo], where c := (ao+bo)/2 is the midpoint. 

If c e A let [«i , bf] := [c, bo], otherwise if c e A c 
let [a\, b\] := [ao, c], Continue taking midpoints 
to get a nested sequence of intervals [a n , b n ] in I, 
with a n e A, b„ e /\A. 

By the bisection property (Example 4.3(3)), the sequences (a„) and ( b n ) are 
Cauchy and asymptotic, and since R is complete, they converge a n a and b n -* a. 
The consequence is that, inside any open neighborhood B e (a), there are points a„ e A 
and b n e I''* A, making a a boundary point of A. From the preceding proposition, 
this translates as “every interval is connected”. 

Every connected subset C of R has the interval property a,b G C =y 
[a, b] c C: Let C be a connected set, and let a, b e C (say, a < b). Any x e [a, b] 
which is not in C would disconnect C using the disjoint open sets ]— oo,x[ and 
]x, oo[. 

Every subset of R with the interval property is an interval : Let A have the 
interval property. If A 0, say x e A, and has an upper bound, then it has a least 
upper bound b. The interval [x, b[ is a subset of A because there are points of A 
arbitrarily close to b. Similarly if a is the greatest lower bound then ]a, x] C A. 
Going through all the possibilities of whether A has upper bounds or lower bounds 
or none, and whether these belong to A or not, results in all the possible cases of 
intervals. For example, if it contains its least upper bound b but has no lower bound, 
then [x, b] C. A for any x < b, so that A = ]— oo, b]. □ 

By contrast, the connected sets in other metric spaces may be very difficult to 
describe and imagine. Even in R 2 , there are infinite connected sets such that when 
a single point is removed, the remaining set is totally disconnected! (For further 
information search for “Cantor’s teepee”.) Connectedness is an important intrinsic 
property that a set may have: it is preserved by any continuous function. Even though 
the codomain space may be very different from the domain, a connected set remains 
in ‘one piece’. 


A 


0(1 o o l/Q 

a i o o b\ 

a 2 o o Z) 2 

03 0—0 63 
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Proposition 5.5 

Continuous functions map connected sets to connected sets, 

/ : X — »■ Y continuous and C c X is connected =>■ fC is connected. 

Proof Let C be a subset of X, and suppose fC is disconnected into the non-empty 
disjoint sets A and B covered exclusively by the open sets U and V , that is, 

fC = au b ci/uy, unB = 0 = vnA. 

Then, 

c=r t Au/"'sc f~ l uu f~ l v, f~ l unf~ l B = 0 = f~ l v n/ _1 A. 




Moreover / 1 A and / 1 B are non-empty and disjoint, and / 1 17 and / 1 V are 
open sets (Theorem 3.7). Hence fC disconnected implies C is disconnected. □ 

Almost surprisingly, this simple proposition is the generalization of the classical 
“Intermediate Value Theorem” of Bolzano and WeierstraB. In effect, IVT has been 
dissected into this abstract, but transparent, statement and the previous one that 
intervals are connected. 

Proposition 5.6 Intermediate Value Theorem 


Let C be a connected space, and /: C -> la continuous function. For 
any c with f(a ) < c < fib) there exists an x e C such that fix) = c. 


Proof f C is connected in R and so must be an interval. By the interval property, 
/(a), f{b) e fC Ace fC , so c — fix) for some x e C. □ 
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Exercises 5.7 

1. Any two distinct points of a metric space are disconnected. More generally, 

(a) any set of N points ( N R 2), (b) the union of two disjoint closed sets, are 
disconnected. 

2. The space of rational numbers Q is disconnected, e.g. using the open sets 
]— oo, y/2[ IT Q and j^/2, oo[ fl Q. In fact Q is totally disconnected. 

3. Suppose that there is an x e X and an r > 0 such that d(x, y) f r for all v e X, 
but there are points y with d(x, y) > r. Show that X is disconnected. 

4. ► An open set (such as the whole metric space) is disconnected precisely when 
it consists of (at least) two disjoint open subsets. Find a connected set whose 
interior is disconnected. 

5. *Any two disjoint non-empty closed sets A and B are completely separated in 
the sense that there are disjoint open sets AC.U,BC.V,Uf)V=0. (Hint: 
use Exercise 3.12(17).) 

6. ► A path is a continuous function I —> X where I is an interval in R. Its 
image is connected. Hence show that the parametric curves of geometry, such as 
straight line segments, circles, ellipses, parabolas, and branches of hyperbolas 
in R 2 , are connected. 

7. (a) The function fix) := x" is continuous on R, for n = 0, 1 Show that, 

for any fixed n R 1 , x n can be made arbitrarily large. Let y be a positive real 
number; use the intermediate value theorem to show that fpy exists. More 
generally every real monic polynomial x n + ■ ■ ■ + a\x + «o in R 1), where 
a o is negative or when n is odd, has a root. 

(b) Every continuous function / : [0, 1] — > [0, 1] has a fixed point. (Hint: 
consider f(x) — x.) 

8. If /: [0, l] 2 —> R is continuous and f(a) < c < f{b) then there is an x e 
[0, l] 2 such that c = fix) (assuming [0, l] 2 is connected). 

9. Suppose X is connected and / : X — > R is continuous and locally constant, that 
is, every x € X has a neighborhood taking the value fix). Then / is constant 
on X. (Hint: Show f~ l f(a) is closed and open in X.) 

10. Q has non-interval subsets with the interval property (e.g. [0, \/2[ fl Q). 

11. Use the intermediate value theorem to show that a 1-1 continuous function on 
[a, b) must be increasing or decreasing 


* ^ y =>• fix) < fiy) OR x < y => fix) R f(y). 


62 


5 Connectedness 


5.2 Components 

It seems intuitively clear that every space is the disjoint union of connected subsets. 
To make this rigorous, let us present some more propositions that go some way in 
helping us show whether a set is connected, especially the principle that whenever 
connected sets intersect, their union is connected', this allows us to build connected 
sets from smaller ones. 

Proposition 5.8 


If C is connected then so is C with some boundary points (in particular C). 


Proof Let D be C with the addition of some boundary points. Suppose it separates 
as D = A U B each covered exclusively by open sets U and V. Then C would also 
split up in the same way, unless C C U say. This cannot be the case, for let x e B 
be a boundary point covered by V. Then there is a ball B r ( x ) C V containing points 
of C, a contradiction. Thus D disconnected implies C is disconnected. □ 

Theorem 5.9 


If Aj , B are connected sets and V/ A,- n B / 0 then B U |J ; A, is connected. 
If A„ are connected for n = 1,2,..., and A n n A n+ \ ^ 0 then LJ n A n is 
connected. 



Proof (i) Suppose the union B U |J ( . A/ is disconnected and splits up into two parts 
covered exclusively by open sets U and V. Then B would split up into the two parts 
BHU and BC \V were these to contain elements. But as B is known to be connected, 
one of these must be empty, say B fl U = 0. For any other A := A/ that is partly 
covered by U (and there must be at least one) we get A fl V — 0 and A C. U, for 
the same reason. But then Ar\BC.UnB = 0, contradicting the assumptions. 

In particular, note that if A, B are connected and A C\ B f 0, then A U B is also 
connected. But the statement is true even for an uncountable number of A, . 
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(ii) If Cn := (J') = | A„ is connected, then Cn+\ = Cn U A;v+i is also connected by 
the first part of the theorem, since Cn F) An+i ^ 0. By induction C,y is connected 
for all N. As Aj C Cn for all N, it follows that U/v=i Cn = U«^=i is a ls° 
connected. □ 

The converses of both these statements are false, but the following holds: 

Proposition 5.10 


Given non-empty connected sets A, B, 

A U B is connected 3r e A U B, { x } U A and{ ijllfi are connected. 


Proof Suppose no point x e A makes { x } U B connected. That is, for each x e A 
there are two open sets which separate j.rJUB. Call the set which contains x,U x , and 
the other one V x . They would also separate B unless B C V x , and U x (T B = 0. So 
U v U x is an open set containing A but disjoint from B. If the same were to hold for 
points in B, then there would be an open set containing B but disjoint from A, making 
AUB disconnected. The converse is a special case of the previous proposition. □ 

Theorem 5.11 


A metric space partitions into disjoint closed maximal connected subsets, 
called components. Any connected set is contained in a component. 


By a maximal connected set is meant a connected set C such that any A D C 
(A ^ C) is disconnected. 

Proof The relation x ~ y, defined by { x . y | C C for some connected set C, is 
trivially symmetric ; it is reflexive since { x , x } = { x } is connected, and it is transitive 
because if x, y e C\ and y, z e Cb then x, z e Cj U C 2 , which is connected by 
Theorem 5.9 as y e C j n C 2 . Moreover, another way of writing the relation x ~ y 
is as 

y e ^J{ C connected : x e C }, 

so that the equivalence class [x] (called the component ) of x is the union of all the 
connected sets containing x. What this implies is that any connected set C that con- 
tains x must be part of the component of x. In addition, the component is connected 
by Theorem 5.9 and it is maximally so, as no strictly larger connected set containing 
x can exist. In particular since [x] is connected it must be the case that [x] = [x] and 
[x] is closed (Proposition 2.16). □ 
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Exercises 5.12 

1. Show that R 2 is connected by considering the radial lines all intersecting the 
origin. 

2. ► More generally, if there exists a path between any two points, then the metric 
space is connected. (It is enough to find a path between any point and a single 
fixed point; why?) Such a space is said to be path-connected . 

3. The square [0, l] 2 and the half-plane ]a, oo[ x M are connected. 

4. Any disk in E 2 is path-connected. Do balls in a general metric space have to be 
connected? Consider the space X := ]— oo, — 1 [ U ] 1 , oo[ and find a ball in this 
space which is not connected. 

5. ► If X, Y are connected spaces then so is X x Y. 

6. The set E 2 \{ x } is connected. But R.\ { x } is disconnected. Deduce that R and 
E 2 are not homeomorphic. 

Using the same idea, show that [a. b], [ a , b[ and ]a, b[ are not homeomorphic 
to each other, and neither is a circle to a parabola. 

7. A connected metric space, such as R, has one component, itself. At the other 
extreme, in totally disconnected spaces, the components are the single points 
{ a }, e.g. Q and Z. 

8. If a subset of X has no boundary (so is closed and open) then it is the union of 
components of X. 

9. Components need not be open sets (e.g. in Q). 

10. A metric space in which B r (x) is connected for any x and any r sufficiently 
small is said to be locally connected. Show that for a locally connected space X, 

(a) the components are open in X, 

(b) any convergent sequence converges inside some component, 

(c) if X is also separable, then the components are countable in number. 


Chapter 6 

Compactness 


6.1 Bounded Sets 


Definition 6.1 


A set B is bounded when the distance between any two points in the set has 
an upper bound. 


3r > 0, Vx, y e B, d(x,y)^r. 

The least such upper bound is called the diameter of the set: 

diamZ? := sup d(x,y). 



x,yeB 


In everyday, but not very helpful, terms one can say that a bounded set does not 
“reach to infinity”, or even that it is “finite” in a geometric sense (the unit circle has 
an infinite number of points but is bounded in R 2 ). The characteristic properties of 
bounded sets are 

Proposition 6.2 


Any subset of a bounded set is bounded. 

The union of a finite number of bounded sets is bounded. 


Proof (i) Let B be a bounded set with d(x. y) R r for any x, y e B. In particular 
this holds for x, y in any subset A C B, so A is bounded. 

(ii) Given a finite number of bounded sets B\,...,Bn, with diameters ri respec- 
tively, let r := max,, r n . Pick a representative point from each set, a n e B n , and take 
the maximum distance between any two, r := max m „ d(a m , u n ) ; it certainly exists 
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as there are only a finite number of such pairs. Now, for any two points x , y e [J n B n , 
that is, x e Bj,y e B j, and using the triangle inequality twice. 


d(x, y ) ^ d(x, ai) + d(aj, aj) + d(aj, y) 

< rj + 7+ rj 
<2 r + T, 

an upper bound for the distances between points in 
Un=i B n is found. 



□ 


Examples 6.3 

1. In any metric space, finite subsets are bounded. In N, only the finite subsets are 
bounded (since d(ao, a n ) ^ N for all n implies n ^ N). Consequently, N, Q, 1R, 
and C are all unbounded. 

2. In a discrete metric space, every subset is bounded. A metric space may be “large” 
(non-separable) yet be bounded. 

3. ► A set B is bounded O it is a subset of a ball, 

3r >0, 3fl e X, B c B r (a). 

Proof Balls (and their subsets) are obviously bounded, 

V.r, y e B r (a), d(x,y) ^ d{x,a) + d(y, a) < 2 r. 

Conversely, if a non-empty set is bounded by R > 0, pick any points a e X and 
b e B to conclude x e B r (a): 

Vx e B, d(x, a) ^ c/(x, b) + d{b, a) < R + 1 + d(b, a) =: r. 

4. The set [0, 1[U]2, 3[C R is bounded because it can be covered by the ball Bj(0), 
or because it is the union of two bounded sets. 

5. ► Boundedness is not necessarily preserved by continuous functions: If B is 
bounded and / is a continuous function, then fB need not be bounded. Worse, a 
set may be bounded in one metric space X, but unbounded in a homeomorphic 
copy Y. 

For example, N with the standard metric is unbounded, but its homeomorphic 
copy, N with the discrete metric, is bounded. 
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Exercises 6.4 

1. The set [—1, 1[ is bounded in R. with diameter 2; in fact diam[a, b[— b — a. 

2. Show that if diam(A) 5) r, diainf B) 5) s, and assuming A fl B ^ 0 then 
diam(A UB)^r + s. 

3. Any closed ball B r [a] { x : d(x, a) ^ r} is bounded; hence the closure of a 
bounded set is bounded. 

4. ► Cauchy sequences are bounded (Example 4.3(5)). So unbounded sequences 
cannot possibly converge. 

5. ► Prove that Lipschitz functions map bounded sets to bounded sets (Exercise 
4.17(3)). So equivalent metric spaces have corresponding bounded subsets. 


6.2 Totally Bounded Sets 

We have seen that boundedness is not an intrinsic property of a set, as it is not 
necessarily preserved by continuous functions. Let us try to capture the “finiteness” 
of a set with another definition: 

Definition 6.5 


A subset B C X is totally bounded when it can be covered by a finite number 

of e-balls, however small their radii e, 


Ve >0, 3 N e N, 3ai,...,a N e X , 

B c 0 B e (a n ). 




Easy Consequences 

1. Any subset of a totally bounded set is totally bounded (the same e -cover of the 
parent covers the subset). 

2. A finite union of totally bounded sets is totally bounded (the finite collection of 
e -covers remains finite). 

3. A totally bounded set is bounded (it is a subset of a finite number of bounded 
balls). 
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Examples 6.6 

1. The interval [0, 1] is totally bounded in K because it can be covered by the balls 
B € ( ne ) for n = 0, . . . , N, where 1/e — 1 < IV ^ 1/e. 

2. Not all bounded sets are totally bounded. For example, in a discrete metric space, 
any subset is bounded but only bnite subsets are totally bounded (take e < 1). 

3. ► A totally bounded space X is separable. 

Proof For each n = 1,2,..., consider finite covers of X by balls B\ / n (a l n ) and 
let A„ := {a/,,!} be the finite set of the centers, so A \= (J^j A„ is countable. 
For any e > 0 and any point x e X, let n f 1 /e, then x is covered by some ball 
Bi/ n ( a i,n ), i.e., d(x, 0-i „) < e, thus A — X. 

4. The center points a„ of the definition may, without loss of generality, be 
assumed to lie in B. Otherwise cover B with balls B e / 2 (x„), and take repre- 
sentative points a„ e B fl B e / 2 (x„) whenever non-empty; then [J (j B e (a n ) 3 
B n U„ B €/ 2 {Xn) 2 B. 

Proposition 6.7 


A uniformly continuous function maps totally bounded sets to totally 
bounded sets. 


Proof Let / : X — * Y be a uniformly continuous function, 

Ve > 0, 3<5 > 0, Vx e X, fB s (x) c B € (f(x)). 

Let A be a totally bounded subset of X, covered by a finite number of balls A c 
U,ii Bs(x n ). Then 


N N 

/Ac[j fBs(x„) c U B € (f(x n )). 

n = 1 n = 1 


□ 

A totally bounded set is geometrically ‘finite’, so an infinite sequence of points 
in a totally bounded set is caged in, so to speak, with nowhere to escape to: 

Theorem 6.8 


A set B is totally bounded <£> 

Every sequence in B has a Cauchy subsequence. 
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Proof Let the totally bounded set K be covered by a finite number of balls of radius 
1, and let {xi, xi , . . .} be an infinite subset of K. (If K is finite, a selected sequence 
must take some value x, infinitely often and so has a constant subsequence.) A finite 
number of balls cannot cover an infinite set of points, unless at least one of the balls, 
B i («i), has an infinite number of these points, say {xij, X24 , . . .}. 

Now cover K with a finite number of ^-balls. For the same reason as above, at 
least one of these balls, /i| /2(«2) covers an infinite number of points of {x„ i ), say 
the new subset {xj2, X2,2> ■ ■ ■}• Continue this process forming covers of ^-balls and 
infinite subsets {x, hm } of B\/ m {a m ). The sequence (x„„) is Cauchy, since form ^ n, 
both x mm and x, M , are elements of the set {x\ m , X 2 , m , ■ ■ ■ }, and so d(x n n , x m m ) < 
— — > 0 as n, m — > oo. 

For the converse, start with any a i e A. If B f (ct ] ) covers A then there is a single- 
element e-ball cover. If not, pick ai in A but not in B € {a\). Continue like this to 
get a sequence of distinct points a n e A with a n [J"”/ B € (ai), all of which are 
at least e distant from each other. This process cannot continue indefinitely else we 
get a sequence ( a n ) whose points are not close to each other, and so has no Cauchy 
subsequence. So after some N steps we must have A C B f (a, ). □ 

Exercises 6.9 

1. ► If A and Y are totally bounded metric spaces, then so is A x Y. 

(Hint: If B € (x n ) (n = 1, . . . , N) cover A and B f (y ln ) (m = 1, . . . , M ) cover Y . 
show that every point (x, y) e XxY lies in lh f (x,- , yj) for some i f N, j f M.) 

2. ► In R iV (and C' v ), a set is bounded <£> it is totally bounded. 

(Hint: Show that if B is a bounded set in R iV , with a bound R > 0, then B is a 
subset [ —R , R] N , which is totally bounded by the previous exercise.) 

3. The set of values of a Cauchy sequence is totally bounded. 

4. The closure of a totally bounded set is totally bounded. 

5. Let B C. Y c A, then B is totally bounded in Y it is totally bounded in A 
(Example 2.12(3)). 

6. Any bounded sequence in R w (or C N ) contains a convergent subsequence, (Hint: 
x n e [—1?, R] n is totally bounded.) 

7. A continuous function / : A -* Y, with A, Y complete metric spaces, maps 
totally bounded subsets of A to totally bounded subsets of Y . (Hint: consider a 
sequence in f B for a totally bounded set B C A.) 


6.3 Compact Sets 

In the presence of completeness, continuous functions preserve totally bounded sets. 
Alternatively, we can strengthen the definition of boundedness even further to a 
property that is preserved by continuous functions; such a property is compactness. 
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but it will emerge that compact sets are precisely the complete and totally bounded 
subsets. 

Definition 6.10 


A set K is said to be compact when given any cover of balls (of possibly 
unequal radii), there is a finite sub-collection of them that still cover the set (a 
subcover), 

N 

K c [J B €i (at) =>• 3 / 1 , . . . , ijv> K c [J B €in (a,„). 

i n = 1 


Examples 6.11 

1. Any finite set, including 0, is compact. 

2. The subset [0, 1[C R is totally bounded but not compact. For example, the cover 
using balls Z?i _ i/„ (0) for n = 2, . . . has no finite subcover. On the other hand, 
we will soon see that the closed intervals [a, b] are compact. 

3. ► Compact metric spaces are totally bounded, and so bounded and separable 
(consider the cover by all e-balls). Thus, R and N are not compact. 

An equivalent formulation of compactness is the following. By an open cover is 
meant a cover consisting of open sets, K C (J ; . A/ (A, open subsets of X). 

Proposition 6.12 


A set is compact any open cover of it has a finite subcover. 


Proof Let open sets A, cover a compact set, K C (J ■ A j. Each open set A j consists 
of a union of balls. It follows that K is included in a union of balls. By the definition 
of compactness, there is a finite number of these balls B €l (a\), ... . B €N («,v) that still 
cover the set K . Each of these balls is inside one of the open sets, say B €j (a, ) C A j t , 
and 

N N 

A c U B (i ( ai ) c [J A j. 

i=t i=t 


as claimed. 

Conversely, suppose K is such that any open cover of it has a finite subcover. This 
holds in particular for a cover of (open) balls, so K is compact. □ 
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We will soon strengthen the following proposition to show that compact sets are 
complete, but the following proof is instructive, and remains valid in more general 
topological spaces: 

Proposition 6.13 

Compact sets are closed. 


Proof Let K be compact and x e X ' K . 7b show x is exterior to K, we need to 
surround it by a ball outside K. We know that x can be separated from any y e K 
by disjoint open balls B ly (x) and B ry (y) (Proposition 2.5). Since y e B ry (y), these 
latter balls cover K . But K is compact, so there is a finite sub-collection of these 
balls that still cover K, 


K C B n (yi)ll---l> B rN (y N ). 


Now let r := minjri, . . . , r# }; then B r (x) fl K = 0 since 

z e B r (x) ze B n (x) for = 1 , . . . , N 

=> Z & B ri (yi) U • • ■ U B rN (yjv) 2 K. 


Therefore, x e B r {x) c X\K. □ 

Proposition 6.14 

A closed subset of a compact set is compact. 

A finite union of compact sets is compact. 


Proof (i) Let F be a closed subset of a compact set K, and let the open sets A, cover 
F; then 

K C F U (X\F) C (J A; U (X\F). 


The right-hand side is the union of open sets since X \ F is open when F is closed. 
But K is compact and therefore a finite number of these open sets are enough to 
cover it, 

N N 

* £ [J A, U (X\F), so FcjA,-. 

i=i mi 

(ii) Let the open sets A, cover the finite union of compact sets K \ U • • • U K x ■ Then 
they cover each individual K n , and a finite number will then suffice in each case, 
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K n c |J " A / k . For /; = l, N, the collection of chosen A H: remains finite, and 
together cover all the K n . □ 

Compactness is strong enough that it is preserved by continuous functions; it is 
thus a truly intrinsic property of a set, as any homeomorphic copy of a compact set 
must also be compact. 

Proposition 6.15 


Continuous functions map compact sets to compact sets, 

/ : K c X -> Y continuous and K compact =>■ f K compact. 


Proof Let the sets A, be an open cover for f K, 


fK^\J Ai 

i 


From this can be deduced 


k ^r l \jA i = \jr l A i . 

i i 


But f~ l Ai are open sets since / is continuous (Theorem 3.7). Therefore the right- 
hand side is an open cover of K. As K is compact, a finite number of these open sets 
will do to cover it, 

N 

K^\jr l A ik . 

k= 1 

It follows that there is a finite subcover, f K C (J^ A, k , as required to show f K 
compact. □ 

To summarize, 


Continuous functions preserve compactness, 

Uniformly continuous functions preserve total boundedness, 
Lipschitz continuous functions preserve boundedness. 


An immediate corollary is this statement from classical real analysis: 
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Corollary 6.16 


Let /: K -»• R be a continuous function on a compact space A'. Then its 
image fK is bounded, and the function attains its bounds, 

e K, V* e K, f(x 0 ) < /(*) < /(* i). 


Proof The image fK is compact, and so bounded, fK C 5^(0), i.e., |/(*)| ^ R for 
all x e K . Moreover compact sets are closed and so contain their boundary points. In 
particular fK contains inf fK and sup fK (Example 2.8(3)), i.e., inf fK — f(x o), 
sup fK = f(x i) for some *o> x\ & K. □ 

A property that holds locally, i.e., in a ball around any point, will often also hold in 
a compact set by using a finite number of these balls. As an example of this, consider 
a continuous function with compact domain. By the definition of continuity, any x 
in the domain is surrounded by a small ball B$ x (*) on which the function varies by 
at most a small fixed amount e; on a compact domain, a finite number of these balls 
and radii suffice to cover the set, so a single 5 can be chosen irrespective of x. More 
formally, 

Proposition 6.17 


Any continuous function from a compact space to a metric space, 
/ : K — > Y, is uniformly continuous. 

If, moreover, / is bijective, then / is a homeomorphism. 


Proof (i) By continuity of /, every x e K has a 8 X for which fB$ x (x) c B f (fix)) 
(Theorem 3.7). Since the balls B$ x / 2 {x) cover K, there is a finite subcover, from 
which can be chosen the smallest value of S. Let a, b e K be any points with 
d(a , b) < 5/2. The point a is covered by a ball B, )x / 2 (x ) from the finite list. Indeed, 
Bg x (x) covers b too since 

d(x, b) ^ d(x, a) + d (a, b) < 8 x /2-\- 8/2 ^ 8 X . 

As both a and b belong to Bs x (x), their images under / satisfy f (a). f(b) e 
B € (/(*)), so that 


d(f(a), f(b)) f d(f (a), /(*)) + d(f(x), f(b)) < 2e. 
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This inequality was achieved with one 8 independently of a and b, so / is uniformly 
continuous. 

(ii) If / is continuous and onto, Y — f K is compact. But when in addition it is also 
1 - 1, it preserves open sets: if A is open in K, then A is closed, hence compact, 
in K\ this is mapped 1-1 to the closed compact set /(AT\A) = T\/A, implying 
that fA is open in Y. This is precisely what is needed for f~ l to be continuous, and 
thus for / to be a homeomorphism. □ 

We are now ready for some concrete examples, starting with that of M, the simplest 
non-trivial complete space. 

Proposition 6.18 Heine-Borel’s theorem 


The closed interval [a, b] is compact in K. 


Proof Let (J ; A, 3 [a, b] be an open cover of the closed interval. We seek to obtain 
a contradiction by supposing there is no finite subcover. One of the two subintervals 
[a, (a + b)/ 2] and [(a + b)/2, b] (and possibly both) does not admit a finite subcover: 
call it [a i, /; i] . Repeat this process of dividing, each time choosing a nested interval 
[a„, b n \ of length ( b — a)/ 2" which does not admit a finite subcover. 

Now ( a n ) and (b n ) are asymptotic Cauchy sequences, which must therefore con- 
verge to the same limit, say, a n x and b„ — » x (Proposition 4.2 and Theorem 4.5). 
This limit x is in the set [a, b] (Proposition 3.4) and is therefore covered by some 
open set A, 0 . As an interior point of it, x can be surrounded by an e-ball (in this case, 
an interval) 

x e B € (x) c Aj 0 . 

But a n —> x and b n — > x imply that there is an N such that ay- ^,v € B € (x), and 
so [<3iv, b /y] C B € ( x) C A/q . This contradicts how [aw, btj] was chosen not to be 
covered by a finite number of A, ’s, so there must have been a finite subcover to start 
with. □ 

The Heine-Borel theorem generalizes readily to arbitrary metric spaces. 

Theorem 6.19 


A set K is compact K is complete and totally bounded. 


Proof Compact sets are totally bounded: Let A" be a compact set. For any e > 0, 
cover K with the balls If (x ) for all x e K. This open cover has a finite sub-cover. 
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Compact sets are complete : Let ( x„ ) be a Cauchy sequence which has no limit in 
K, so that for each x e K, 

3e > 0, V1V, 3 t? ^ N, d(x n ,x) ^ e. 

For this e (which may depend on x), 

3M, n, m ^ M => d(x n ,x m ) < e/2, 

e ^ £/(x„, x) ^ <f(x n , x m ) + d (x m , x) < e/2 + d(x m , x), 

7?t ^ M =>■ d(x m ,x) (5 e/2. 

For m < M, the distances d(x m , x) take only a finite number of values. Hence, for 
each x e K, there is a small enough ball B r ( x) (x) which contains no points x n unless 
x n = x. This gives an open cover of K, which must have a finite sub-cover. But this 
implies that the sequence takes a finite set of values and so must eventually repeat 
and converge (Exercise 4.10(4)). In any case, there must be a limit in K. 

Complete and totally bounded sets are compact : Let A" be a complete and totally 
bounded set. Suppose it to be covered by open sets V) , but that no finite number of 
these open sets is enough to cover K. Since K is totally bounded, 

N 

K<Z (J Bi( yi ) 

i=i 

for some yi e K (Example 6.6(4)). If each of these balls were covered by a finite 
number of the open sets V), then so would K. So at least one of these balls needs an 
infinite number of V; ’s to cover it; let us call this ball B\ (xi). 

Now consider B\ (xi) fl K , also totally bounded. Once again, it can be covered by 
a finite number of balls of radius 1 /2, one of which does not have a finite subcover, 
say fii/ 2 (x 2 ). Repeat this process to get a nested sequence of balls fii/ 2 "(x„), with 
x n e K, none of which has a finite subcover. The sequence (x„ ) is Cauchy since 
d (x „ , x m ) < 1 /2" (for m > n), and K is complete, hence x„ — ► x in K. 

But x is covered by some open set V/ 0 . Therefore there is an e >0 such that 

x e £ f (x) c Vj 0 . 

Moreover since 1/2" 0 and x„ — ► x, an N can be found such that 1/2^ < e/2 

and d(xff,x) < e/2, so that for d(y, xjy) < 1/2 W , 

d(y, x) ^ d(y, xn) + d{xxt, x) < e 
i.e., B 1/2 n(x n ) c B ( (x) c V io , 

which contradicts the way that the balls B i/ 2 « (x n ) were chosen. □ 
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Corollary 6.20 

In a complete metric space, a subset K is compact 
K is closed and totally bounded. 

In R n , K is compact o K is closed and bounded. 


Proof In a complete metric space, a subset is complete if, and only if, it is closed 
(Proposition 4.7). 

In the complete space R w , a set is totally bounded if, and only if, it is bounded 
(Exercise 6.9(2)). Note carefully that this remains true whether the distance is Euclid- 
ean, D\ , or D 0 0 (Example 2.2(6)). □ 

Theorem 6.21 Bolzano- Weierstrall property 

In a metric space, a subset K is compact 

every sequence in K has a subsequence that converges in K 
every infinite subset of K has a limit point in K. 


Proof (i) A compact set is totally bounded, and so every sequence in it has a Cauchy 
subsequence (Theorem 6.8). But compact metric spaces are also complete, implying 
convergence of this subsequence in K. 

(ii) Let A be an infinite subset of K, and select a sequence of distinct terms a \ , a. 2 , . . . 
in A. Assuming that every sequence in K has a convergent subsequence, then a ni —> 
a e K , as i oo. For any ball B e (a), there are an infinite number of points 
a m e B € (a), making a a limit point of A (a can be equal to at most one of these 
distinct points). Thus K satisfies the Bolzano- WeierstraB property that every infinite 
subset has a limit point in K . 

(iii) Let K have the Bolzano- WeierstraB property, let (x„ ) be any sequence in K 
and let A be the set of its values {xq, x\, xj, ■ ■ ■}• If A is infinite, then it has a limit 
point x e K and so there is a convergent subsequence x n — > x with x„ e A 
(Proposition 3.4). Otherwise, if A is finite, one can pick a constant subsequence. In 
either case there is a (Cauchy) convergent subsequence in K. 

This shows, firstly, that K is totally bounded, and secondly, that every Cauchy 
sequence in K converges in K (Exercise 4.10(10)), that is, K is complete. □ 

Exercises 6.22 

1. A compact set that consists of isolated points is finite. 

2. In Z, and any discrete metric space, the compact subsets are finite. 

3. Show that [0, 1] fl Q is closed and totally bounded in Q but not compact. (Hint: 
first show that [0, r[flQ is not compact when r is irrational.) 
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4. (Cantor) Let K n be a decreasing nested sequence of non-empty compact sets. 
If n„ Kn = 0 then X \ K n (n — 2,3,.. .) form an open cover of K \ . Deduce 
that P| (! K n is compact and non-empty. Moreover, if diam K„ — > 0 then p| n K n 
consists of a single point. 

5. The Cantor set is compact, totally disconnected, and has no isolated points 
(Exercise 2.20(7)). (In fact, it is the only non-empty space with these proper- 
ties, up to homeomorphism.) 

6. The least distance between a compact set and a disjoint closed subset of a metric 
space is strictly positive. 

7. Suppose K is a compact subset of R 2 whichlies in the half-plane {(x, y) : x > 0}. 
Show that the open disks with centers (jc + x -1 , 0) and radii x > 1 cover the 
half-plane, and deduce that K is enclosed by a circle that does not meet the 
y-axis. 

8. The circle .S' 1 is compact; more generally, any continuous path [0, 1] X has 
a compact image. 

9. Show that there can be no continuous map (i) S 1 —> [0, 2 n[ which is onto, or 
(ii) S 1 -» R which is 1-1. 

10. A continuous function /: R 2 — > R. takes a maximum, and a minimum, value 
on a continuous path y : [0, 1] — > R 2 . For example, there is a maximum and a 
minimum distance between points on the path and the origin. Give an example 
to show that this is false if [0, 1] is replaced by ]0, 1], 

11. If / : X — > K is bijective and continuous, and K is compact, it does not follow 
that X is compact. Show that the mapping f(6) := (cos0, sin0)forO ^ 0 < 2it, 
is a counter-example. 

12. Generalize theHeine-Borel theorem to closed rectangles [a, h] x [c, cl] in JR. 2 , by 
repeatedly dividing it into four sub-rectangles and adapting the same argument 
of the proof. Can you extend this further to 1R /V ? 

13. ► The spheres and the closed balls in R' v are compact. 

14. Verify that [a, b] flQ is not compact by finding an infinite set of rational numbers 
in [a, b] that does not have a rational limit point. 

15. Let /: R w -* R iV be a continuous function; consider the following iteration 
x, i-i-i := f(x n )/\f(x„)\ of mapping by / and normalizing. Show that there is a 
convergent subsequence (one for each limit point), assuming f(x n ) ^ 0. 

16. ► If X, Y are compact metric spaces then so is A x Y. 

17. It is instructive to find an alternative proof that a continuous function maps a 
compact set to a compact set, using the BW property. 
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Karl Weierstrafi (1815-1897) After belatedly becoming a sec- 
ondary school mathematics teacher at 26 years, he privately 
studied Abel’s subject of integrals and elliptic functions, until 
in 1854 he wrote a paper on his work and was given an hon- 
orary degree by the University of Konigsberg. He then became 
famous with his programme of “arithmetization” based upon 
the construction of the real numbers, and of a function that is 
continuous but nowhere differentiable. 


Fig. 6.1 Weierstrafi 


6.4 The Space C(X, Y) 

We are now ready to turn the set of continuous functions /: [0, 1] — »■ C into a 
complete metric space C[ 0, 1], thereby giving one precise meaning to /„ -> /. To 
appreciate the difficulty involved, note that if we were to define /„ —> f to mean 
pointwise convergence, that is, /„(x) — > fix) for all x e [0, 1], then we would 
get an incomplete space: The polynomials x n converge pointwise to a discontinuous 
function. In fact we will consider the more general case of bounded functions from 
any set to a metric space. A bounded function is one such that im / is bounded in the 
codomain Y , that is, 


3 r > 0, Va, b e X, dy if (a), fib)) ^ r. 


Theorem 6.23 


The space of bounded functions from a set A to a metric space Y is itself 
a metric space, with distance defined by 

d(f,g) := sup d Y (f(x),g(x)), 

xeX 

which is complete when Y is. 

It contains the closed subspace Q, ( X, Y) of bounded continuous functions, 
when A is a metric space. 


Proof Distance: The distance is well-defined because if im/ and imp are bounded, 
then so is their union, and dy(f(x), g(x)) ^ diamfim/ U im g) for all x e X. 

That d satisfies the distance axioms follows from the same properties for dy; 

d(f, g) — 0 Vx e A, dy(f(x), g{x)) = 0 
<7 Vr e A, fix) = gix) 

o f = g, 
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d(f, g ) = sup d Y (f (x), g(x)) 

xeX 

< sup (dyif (x), h(x)) + d Y (h(x), gix))) 

xeX 

^ sup dyifix), h(x)) + sup d Y (/?(x), gix)) (Exercise 3. 5(7b)) 

xeX xeX 

= d(f,h) + d(h,g). 

The axiom of symmetry d(g, f) = d ( /, g) is easily verified. 

Completeness: Let /„ : X — >■ Y be a Cauchy sequence of bounded functions, 
then for every x e X, 

dyifnix), fm(x )) < d{f n , f m ) 0, as n, m -> oo 

so ( f„(x )) is a Cauchy sequence in Y. When Y is complete, /„ (x ) converges to, 
say, f{x). 

Normally, this convergence would be expected to depend on x, being slower for 
some points than others. In this case however, the convergence is uniform , as it is 
dif n , f m ) := sup X dy(f n (x), f m (x )) which converges to 0. So given any e > 0 
there is an N, such that dy(f n (x), f m (x)) < e/2 for any n,m ^ N and any 
x e X. For each x, we can choose m ^ N, dependent on x and large enough 
so that dyifmix), fix)) < e/2, and this implies 

Vx e X, dyifnix ), fix)) < dyifnix), f m {x)) + dy{f n (x), fix)) < e (6.1) 

for any n Jj N . Since this N is independent of x, it follows that d( f„. f) — > 0. 

The function / is bounded because for any x, y e X, using (6.1), 

dyifix), fiy)) < dyifix), fyix)) + dyifyix), f N iy)) + dyifyiy), fiy)) 

< e + R f N + e (6.2) 

with N independent of x and y, where R f N is the diameter of im fy. 

CbiX, Y) is closed: If X is a metric space and /„ are continuous, then this same 
inequality (6.2) shows that / is also continuous: if Ty is small enough, then 

dxix, y) < S N =>• dyifyix), fNiy)) < e 
=► dyifix), fiy)) < 3e, 


so that /„->/€ CbiX, Y). 


□ 


Often we write C(X) for the complete metric space CbiX , C). 
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The convergence /„ -> f in C/, (X, Y) is called uniform convergence. It is much 
stronger than pointwise convergence f n (x) — > /(x),Vx e X\ since ||/„ — f\\ = 
sup^ |/„ (x) — /(x)| is decreasing to 0, f n approximates / for large n at all values 
of x uniformly. 

Recall that continuous functions on a compact domain are uniformly continuous 
(Proposition 6.17). Thus any ball of a fixed radius 8 is mapped by a real- valued 
continuous function / into a ball of radius e. So, if [«, /;] c R is partitioned into 
intervals [x,- , x,- + <5[, then / maps each into an interval of length at most 2e. Letting 
/ take a constant value /(x,) on each interval gives a uniform approximation by a 
“step” function. Of course, step functions are usually discontinuous. We can improve 
the approximation by constructing a function consisting of straight-line segments 
from one end-point (x; , /(x,)) to the next (x; + 8 , /(x,- + 5). In fact, extending this 
idea further, one can find quadratic or cubic polynomial fits, called “splines” that are 
widely used to approximate real continuous functions, but these spline polynomials 
do not normally join up together as a single polynomial on [a, b\. Such a line of 
argument does give a valid proof that C[a , b] is separable; in fact one can even 
generalize it to show that C(K) is separable whenever K is a compact metric space. 
Stone’s theorem goes further and shows that the complex-valued functions on any 
compact subset K of C can be approximated by polynomials on K . 

Theorem 6.24 Stone- WeierstraB 


The polynomials (in z and z) are dense in C(K), when K c C is compact. 


Proof The proof is in five steps. The first two steps show that if a real-valued function 
/ e C ( K ) can be approximated by a polynomial p, then another polynomial can be 
found that approximates |/|. Since the maximum of two functions max(/, <j) can be 
written in terms of | / — g \ , it can also be approximated by polynomials if / and g can. 
The fourth step, which is the main one, shows how a piecewise-linear approximation 
of / e C(K) can be written in terms of max and min. Together these steps prove 
that the polynomials R[x, y] are dense in the space of real continuous functions on 
K. The final step extends this to complex-valued continuous functions. 

(i) There are real polynomials that approximate x | on — I x R 1 : For example, 
let q\ (x) := x 2 , qi{x) 2x 2 — x 4 , ..., defined iteratively by 

< 7 „-l_i(x) := q n {x) + (x 2 — < 7 „(x) 2 ), starting from qo(x) 0. 

Let y n := q n (x) for brevity, where 0 < x < 1. Notice that 
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Un+ 1 - X = y n - X - (y n - x)(y n + x) 

= ( Vn - x){\ - x- y „ ). 

When | y„ — x\ ^ \yi — x\ = x — x 2 , we get 

0 < x 2 ^ y n < 2x — a < 1 
=>■ — x < 1 — x — y„ < 1 — x 
=> \Vn+l - x\ < c\y„ - x| 



- 1 


1 


where c := max(x, 1 — x) < 1. By induction, it follows that as n -> oo, 


\y n +[ ~x\^ c n \y\ - x\ -> 0. 


The special cases x = 0 and x = 1 converge immediately to 0 and 1 respectively, 
while q n (x) — > \x\ when x e [ — 1 , 0[ by the symmetry of the expression in the 
definition of q n . 

Moreover, the convergence is uniform in x (certainly for 0 ^ x < e and 1 — e < 
x ^ 1, but for the other positive values of x it takes at most —2\oge/e iterates for 
I y n - x\ < c n x < e). 

(ii) Let / e C(K , R) (/ ^ 0) with c := max^ e ^ \f(x)\ > 0 (Corollary 6.16). Then 
the scaled function F f/c takes values in [—1, 1] so F can be approximated 
by q o F, where the polynomial q approximates the function x i->- \x \ on [— 1 , 1]. If 
the polynomial p approximates /, it can be expected that the polynomial cq o (p/c) 
ought to approximate | / 1 on C {K ) . This indeed holds since q is uniformly continuous 
on [-1, 1], 


We >0, 3<5 > 0, Wx e K , |T(x) — a\ < S =l> \q o F(x ) — q{a)\ < e 

so writing P := p/c, 

d(f, p) < cS =!> d(F, P) < 8 

=► d(q o F, q o P) < e 

d(\F\,qoP) ^ d(\F\, q o F) + d(q o F, q o P) 

< d(\x\, <jOc[0,l] +d(q o F, q o P) < 2e 
=► d(\f\, cq o P) ^ 2 ce. 

(iii) For real functions, define max(/, g)(x) := max(/(x), g(x)) as well as 
min(/, g)(x) := min(/(x), g(x)); a short exercise shows that 


max(/, g) = (/ + g + \ f - g\)/2, min (f,g) = (/ + g - \ f - g\)/2. 
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But if / and g can be approximated by polynomials, then so can their sum and 
difference, and by (ii), also | / — g |, and hence max(/, g) and min(/, g). 

(iv) The real polynomials are dense among the real continuous functions C(K , R): 
Let f e C(K, R); for any z f w in K, there is a linear function (a polynomial) p ZM , 
which agrees with / at the points z, w, i.e., p z , w (z) — /(z), p z ,w(w) — f(w). 

For a fixed z, let 

U z , w := {a e K : p z , w (a) < f{a) + e} = (/ 

a non-empty open set (since / — p z w is continuous 
and U Z ' W contains z). As w e U ZtW , we have K C 
U-^w Uz,w\ but K is compact so it can be covered 
by a finite number of subsets of this open cover, K — 

Uz,wi U • • • U U Z ' WM . Let g z min(p z wi , . . . , P z ,wm ) 

< / + e ; it is continuous and can be approximated by 
polynomials, from (iii). 

Now let 


- Pz,w ) 1 ] — e, oo[ 

l 



V z :={a e K : g z (a) > f{a) - e] = (/ - g z ) J ]-oo, e[ 

anon-empty open set (/— g- is continuous, and:: e V-). Once again, K C (J„ V z , and 
so K = V Z1 U • • ■ U V ZN . Let h := max(g Z] , . . . , g ZN ), a continuous function which 
can be approximated by polynomials, since g Zj can. Furthermore / — e < h < f + e ; 
and as this holds uniformly in z, we have d( f, h) < e. 

(v) The set of polynomials in z and z, is dense in C(K): If / e C(K) is complex- 
valued, then it can be written as f = u + iv with it, 1 ; real-valued and continuous, 
that can be approximated by real polynomials p, q, say. Then, 

Vz e K, \(p(z) + iqiz )) - (u(z) + iu(z))| < | p{z) - u(z) \ + \q(z ) - u(z)| 

=>■ d(p + iq, u + iv) ^ d(p, u) + d(q, v) 

shows that p + iq approximates /. But is, say, x 2 y + i(x 3 — xy 2 ) a polynomial in 
z? Not necessarily: for example, take the polynomial x itself and suppose Re(z) = 
x = a m z m + ■ ■ ■ T a n z n with a m f 0 being the first non-zero coefficient; then 
a m = lim^o , but Re(z)/z m can be made real or imaginary, so a m = 0, a 
contradiction. Nevertheless, writing x = (z + z) /2 and y — (z — z) /2i shows that 
every polynomial p(x, y) + iq(x, y) is a polynomial in z and z. □ 

The last theorem in this chapter characterizes the totally bounded sets of the space 
C(K, Y ) of continuous functions on a compact space K (in this case, C(K, Y) = 
Cb(K, Y)), in terms of an explicit property of families of functions: 
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Marshall Stone (1903-1989) studied at Harvard under Birkhoff 
(1926), with a thesis on ordinary differential equations and or- 
thogonal expansions (Hermite, etc.). He then worked on spec- 
tral theory in Hilbert spaces, obtaining his big breakthrough in 
1937 when he generalized the Weierstrass approximation theo- 
rem, which led him to the Stone-Cech compactification theory. 


Fig. 6.2 Stone 


Definition 6.25 

A set F C C(X, Y) of continuous functions on metric spaces is said to be 

equicontinuous when 

Ve > 0, 35 > 0, V/ g F, Vx, x! G X, d(x, x') < 5 =>• d(f(x), f(x')) < e. 

The equi in equicontinuous refers to the fact that S is independent of / e F. 

Theorem 6.26 Arzela-Ascoli 

Let K and Y be metric spaces, with K compact. Then 

F c C(K, Y) is totally bounded FK is totally bounded in Y and F is 
equicontinuous. 


Here FK denotes the set {/(x) : / e F, x e K}. 

Proof (i) Let F be a totally bounded subset of C ( K , Y). This means that for any 
e > 0, there are a finite number of continuous functions /),..., f n e F that are 
close to within e of every other function in F . 

FK is totally bounded: Let e > 0 be arbitrary. Each /, K is compact (Proposi- 
tion 6.15), so U? =1 fi K i s totally bounded (Proposition 6.14 and Theorem 6.19), 
and covered by a finite number of balls B € (y/), j = l, , m. This means that for 
every x e K and i — 1 (x) is close to some y j e Y. Combining this with 
the fact that any function / e F is close to some f , gives 

d(f(x ), yj) < d(f (x), ft (x) ) + d(fi (x), yj) < 2e. 

Thus each /(x), where / e F and x e K, is close to some yj ( j depends on x and 
/), in other words the finite number of balls Biaiyj) cover FK. 
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F is equicontinuous: We have seen previously that functions / e C(K), in par- 
ticular fi, are uniformly continuous (Proposition 6.17): each e > 0 gives parameters 
5 ; . But we can say more. Since there are only a finite number of the functions /, , the 
minimum S min, <5, can be chosen such that 

Ve >0, 35 > 0, Vi, Vx, x' e K, dix,x') < 8 =>■ t/(/,(x), /,(x ')) < e. 

But indeed this works for any / e F: 

Ve >0, 35 > 0, V/ e F, Vx, x e K, d(x, x') < 8 =>■ 

t/(/(x), /(*')) < d(/(x), fi(x)) + J(/,(x), /,(x')) + d(fi(x'), f(x')) < 3e. 

(ii) Let /-W be totally bounded and F be equicontinuous. Then FK can be covered 
by a finite number of balls B € (yj), j = 1 , ... ,m, i.e., any value fix) for / e F 
and x e K is close to some yj to within e. ‘F is equicontinuous’ means that for 
any e > 0, the distance difix), fix')) < € for any / e F, whenever x and x' are 
sufficiently close together to within some 5 > 0 that does not depend on x, x' , or /. 
We also require that K is totally bounded, so that it can be covered by a finite number 
of balls of diameter 5. By removing any overlaps between the balls, we can replace 
them by a finite partition of subsets B ; , i = l, ... ,n, each of diameter at most 5. 

For any / e F and x e B,, fix) is close to some y:, difix), yj) < e. Indeed, 
for any other x e Bi , we have 

difix'), yj ) < difix'), fix)) + difix), yj) < 2e, 

because dix, x') f 8 and F is equicontinuous. In other words, the function / maps 
each Bj into a ball /L f (yj ) ij depending on i), and the whole partitioned space K 
into some of these balls. That is, we know / to within the approximation 2e, if we 
know precisely how it maps each Bj to which ball (yj ) ; this is equivalent to 
an “encoding” i i->- j from i = 1, . . . , n to j — 1 , ... ,m. There are at most m" 
such maps, although not all need be represented by the functions in F . For those 
combinations that are in fact represented by functions in F , select one from each and 
denote it by gk, k = l, . .., N. 

Going back to f e F, with an encoding ; j, pick (jt with the same encoding. 
Then for any x e K , pick yj close to fix) (and gkix)), 

difix), gkix)) f difix), yj) + diy j: gifx)) < 4e 

and taking the supremum over x, we have d(f, gk) f 4e. To summarize, the finite 
number of functions gk are close to within 4e to any function / e F, so that F is 
totally bounded. □ 
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Exercises 6.27 

1. Show that C(X) contains the closed subset Cb(X, R). 

2. ► Uniform convergence, f n — > / in C(X), implies pointwise convergence. 
Plot the functions (i) f(nx ) on [0, 1], where f(x) := max(0, x(l — x), and (ii) 
x i->- 1/(1 + nx) on ]0, oo[; then show they converge pointwise to 0 as n — > oo, 
but not uniformly. 

3- / fn — >■ / / and //(x) f'{x) need not hold if f n converges to / pointwise. 

Show that x i— >• nx" and sinnjc are counterexamples in C( 0, 1). 

4. * (Dini) If K is compact and /„ e C( K) is an increasing sequence of real-valued 
functions, converging pointwise to f e C(K), then /„ — > / in C( K). (Hint: 
Cover K by balls B$(x) inside which / — e < f n < f for n > N x .) 

5. * The space C[a, b] is separable (using piecewise linear functions with kinks at 
rational numbers), but C(R + ) is not. 

6. The subspace of polynomials in C[a, b] is not closed (and so is incomplete): con- 
struct a sequence of polynomials that converges to a non-polynomial continuous 
function in C[0, 1], 


7. If J : X —> X and L : Y -> Y are homeomorphisms then / ^ t o / o J 
a homeomorphism between C(A', Y ) and C(X . Y). 


-l 


Follow the proof of the Stone- Weierstrass theorem to find a quadratic approxi- 

x 0 ^ x ^ 1 
0 — 1 ^ x < 0 ‘ 


mation to the function /(x) := 


9. Suppose f n :K->C are continuous functions on a compact set K , converging 
pointwise to /. By the Arzela-Ascoli theorem, if f n are equicontinuous and 
uniformly bounded ( | /„(x) | ^ c for all x e K,n e N), then / is also continuous. 

10. A set of Lipschitz functions / : [a, b] — > R (Definition 4.14) with the same 
Lipschitz constant c, |/(x) — f(y)\ ^ c|x — y\, form a totally bounded set in 
C[<7, b]. The fact that one c works for all, implies that they are equicontinuous; 
and their collective image in 1R is bounded (|x — y\ ^ \b — a\), hence totally 
bounded. 

11. Show that the set of functions {sinx, sin2x, . . .} and {x, x 2 , x 3 , . . .} on [0. 1] 
are not equicontinuous. 


Part II 

Banach and Hilbert Spaces 



Chapter 7 

Normed Spaces 


7.1 Vector Spaces 

It is assumed that the reader has already encountered vectors and matrices before but 
a brief summary of their theory is provided here for reference purposes. 

Definition 7.1 

A vector space V over a field F is a set on which are defined an operation of 
vector addition + : V 2 —>■ V satisfying associativity, commutativity, zero and 
inverse axioms, and an operation of scalar multiplication F x V — »■ V that 
satisfies the respective distributive laws: for every x, y, z £ V and A, // e F, 


x + (y + z) = (x + y) + z, 
x + y = y + x, 

0 + x = x, 
x + (—x) = 0, 


A(.r + y) — Xx + Ay, 
(A + p)x — \x + px, 
(Xp)x = X(px), 
lx = X. 


0 



J. Muscat, Functional Analysis, DOI: 10. 1007/978-3-3 19-06728-5_7, 
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7 Normed Spaces 


Review 

1. (— l)x — — x, — (— x) = x. Ox = 0, AO = 0. There is little danger that the 
zero scalar is confused with the zero vector, so no attempt is made to distinguish 
them. 

2. F is itself a vector space with scalar multiplication being plain multiplication. 
The smallest vector space is { 0 } (often written as 0). 

3. The product of vector spaces (over the same field), V x W, is a vector space 
with addition and scalar multiplication defined by 

(;:) + 

The zero in this case is (q) and the negatives are — ( * ) = (~'). By extension, 
:= F x • • ■ x F is a vector space. 

4. If V is a vector space, then so is the set of functions V A := { / : A —> V } (for 
any set A) with 

if + 9 )(x) := f{x) + g(x), (A f)(x) := A fix). 

The zero of V A is 0(x) := 0, and the negatives are (— f)(x) — fix). 

5. A subset of a vector space V which is itself a vector space with respect to the 
inherited vector addition and scalar multiplication is called a linear subspace. 
Since associativity and commutativity are obviously inherited properties, one 
need only check that the non-empty subset is “closed” under vector addition 
and scalar multiplication (then the zero 0 = Ox and inverses — x = (— l)x are 
automatically in the set). There are always the trivial linear subspaces { 0 } and V. 

6. The intersection of linear subspaces is itself a linear subspace. 

7. An important example of a linear subspace is that generated by a set of vectors 


HAJ : — { Airti T • • • -(- A n&n '.at £ A, A/ £ F, n £ M }, 


with the convention that |[0]] := {0}. It is the smallest linear subspace that 
includes A, and we say that A spans, or generates, HAJ. Each element of [[AJ 
is said to be a linear combination of the vectors in A. 

8. The set A is linearly independent when any vector a £ A is not generated by 
the rest, a <£ HA\{ a }]. (In particular A does not contain 0.) This is equivalent 
to saying that A a £ [A\ {«}! <£> A = 0, or that for distinct a, £ A, 

n 

' s y' A tat =0 A,- = 0, i = 1, . . . , n. 

i=i 
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A vector generated by a linearly independent set A has unique coefficients A; , 


n 


n 



i = 1 


i'=l 


9. A basis is a minimal set of generating vectors; it must be linearly independent. 
Conversely, every generating set of linearly independent vectors is a basis. 

10. A vector space is said to be finite-dimensional when it is generated by a finite 
number of vectors, V = H«i, • • ■ , a a?]] (:= H{ a\, . . . , on }]]). The smallest such 
number of generating vectors is called the dimension of the vector space, denoted 
dim V , and is equal to the number of vectors in a basis. 

1 1 . For example, F has dimension 1 , because it is generated by any non-zero element, 
while dim{ 0 } = 0. The linear subspace generated by two linearly independent 
vectors |[x , yj is 2-dimensional and is called a plane (passing through the origin). 

12. The space of m x n matrices is a finite-dimensional vector space, generated by 
the mn matrices E , ; consisting of Os everywhere with the exception of a 1 at 
row i and column j . 

13. If V is finite-dimensional, then so is any linear subspace W, and dim W dim V 
(strictly less if it is a proper subspace). 

14. We write A + B { a + b : a e A AND b e B } and AA := { Xa : a e A } for 

any subsets A, B C. V , e.g.Q + Q = Q, C = M + ;K. Thus A(A Ufi) = A A U 
A B, and A(A fl B) — A A fl \B(\ 0); a non-empty set A is a linear subspace 

when A A + pA C A for all A, p e F. For brevity, x + A is written instead of 
{ x } + A ; it is a translation of the set A by the vector x . Care must be taken in 
interpreting these symbols: A — A = {a — b : a, b G A] is not usually { 0 }. 

15. For non-empty subsets of K, sup(A + B) ^ sup A + sup B, sup(AA) = A sup A 
(A ^ 0). 

Proof Let a + b e A + B, then a ^ sup A and b ^ sup B, so sup A + sup B is 
an upper bound of A + B, and hence greater than its least upper bound. 

Similarly, for all a e A, a ^ sup A =>■ Xa f A sup A, so sup(AA) ^ A sup A. 
Hence, sup A = sup(^ A A) ^ j- sup(AA) and equality holds. 

16. The space V is said to decompose as a direct sum of its subspaces M and N, 
written V = M © N , when V = M + N and M Ct N — 0. For example, 


R 2 = 11(1)3 ® !(?)]]. 


17. ► For vector spaces over K or C, a subset C is said to be convex when it contains 
the line segment between any two of its points, 


V.r, y e C, 0 < t ^ 1 =>■ tx + (1 — t)y e C, 
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equivalent to sC + tC = (s + t)C for s, t Js 0. This generalizes easily to 
t\x\ + ■ — h t n x n e C when t\ H — - + t n = 1, f; ^ 0, and x,- e C. Clearly, linear 
subspaces are convex. 

18. The intersection of convex sets is convex. There is a smallest convex set con- 
taining a set A, called its convex hull, defined as the intersection of all convex 
sets containing A, which equals 


{ t\x\ T • • ■ T t n x n x i G A, tj ^ 0, t\ T ■ ■ * -f- t n — 1 , n G N}. 


Hausdorff’s Maximality Principle 

The Hausdorff Maximality Principle is a statement that can be used to possibly extend 
arguments that work in the finite or countable case to sets of arbitrary size. There are 
a few proofs in this book that make use of this principle; it is only needed to extend 
results to “uncountably infinite” dimensions. As such, it is mainly of theoretical 
value, and this section can be skipped if the main interest is in applications. 

Consider a collection Ai of subsets M C X that satisfy a certain property V . A 
chain C = { M a } of such sets is a nested sub-collection, meaning that for any two 
sets M a , Mg e C, either M a C Mfj or Mg C M a . A chain can contain any number of 
nested subsets, even uncountable. A chain is called maximal when it cannot be added 
to by the insertion of any subset in Ai. Hausdorff ’s maximality principle states that 

Every chain in Ai is contained in some maximal chain in A4 . 

Hausdorff’s Maximality principle is often used to show there is a maximal set 
E that satisfies some property V as follows: the empty chain can be extended to a 
maximal chain of sets M a \if it can be shown that the union of this chain E := 1J Q M a 
also satisfies V, then there are no sets properly containing E which satisfy V, by the 
maximality of { M a }, i.e., £ is a maximal set in Ai. 

At the end of this chapter, it is proved that Hausdorff’s maximality principle 
implies the Axiom of Choice. Conversely, the Hausdorff Maximality principle can 
be proved from the Axiom of Choice (using the other standard set axioms), i.e., they 
are logically equivalent to each other, as well as to a number of other formulations 
such as Zorn’s lemma and the Well-Ordering principle. These statements are not 
constructive in the sense that they give no explicit way of finding the choice function 
or the maximal chain, but simply assert their existence. 

The purpose in introducing Hausdorff’s Maximality Principle here is to prove: 

Every vector space has a basis. 
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Proof Consider the collection of linearly independent sets of vectors in V. By 
Hausdorff’s maximality principle, there is a maximal chain A4 of nested linearly 
independent sets A a . We show that E := (J Q A a is linearly independent and spans V , 
hence abasis. If 2]" =1 = 0 for cij e E, then each of the vectors a,- (i = 1 ,...,«) 

belongs to some A ai , and hence they all belong to some single A a because these sets 
are nested in each other; but as A a is linearly independent, A; = 0 for i = 1 , ... ,n. 
Thus E is linearly independent. Suppose E does not span V, meaning there is a 
vector v f |J /A ]] , so that E U { v } is linearly independent. As it properly contains E 
and every A a , it contradicts the maximality of the chain Ai. □ 


7.2 Norms 

We would like to consider vector spaces having a metric space structure. Any set can 
be given a metric, so this is quite possible, but it is more interesting to have a metric 
that is related to vector addition and scalar multiplication in a natural way. Taking 
cue from Euclid’s ideas of congruence, the properties that we have in mind are 

(a) translation-invariance, distances between vectors should remain the same when 
they are translated by the same amount. 



d(x + a, y + a) = d(x, y), 


a 


(b) scaling-homogeneity, distances should scale in proportion when vectors are 


scaled, 




These properties are valid only for special types of metric. When d is translation 
invariant, then d(x, y) = d(x — y , y — y) = d(x — y, 0) and d becomes essentially 
a function of one variable, namely the norm function ]|x|| := d(x, 0) with d(x, y) = 
|| jc — v 1 1 . Conversely, any such d defined this way is translation invariant because 
d(x + a, y + a) = \\x + a — y — a\\ = d(x, y). This function is then scaling- 
homogeneous precisely when 
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|| Ax II = d(Xx, 0) = |A|d(x,0) = |A|||x||. 

What properties does a norm need to have, for d to be a distance? It is easy to see 
that 


d (x, z ) < d(x, y) + d(y, z) 
d(y , x) = d(x, y) 
d(x, y) ^ 0 
d(x, y) = 0 x — y 

where a = x — y, b = y — z. Of these, the symmetry property follows from scaling- 
homogeneity, while positivity follows from 0 = ||x — jc || ^ ||x|| + ||— x|| = 2||x||. 

Definition 7.2 


a + b || ^ ||a|| + ||£> 
1 1 « 1 1 ^ 0 

||a|| = 0 a = 0 


A normed space X is a vector space over F = R or C with a function called 
the norm || • || : X — » M such that for any x, y e X, A e F, 

H* + ;ylK \\x\\ + ||y||, ||Ax|| = |A|||x||, ||x|| = 0 x = 0 . 


If necessary, norms on different spaces are distinguished by a subscript such as 
|| X. A positive function that satisfies the first two axioms is termed a semi-norm. 


Easy Consequences 

1. ||x - y|| > ||x|| - ||y||. 

2. ||xi + • • • + x„ || < llxill + • • • + ||x„|| (by induction). 

Examples 7.3 

1 . The absolute value functions, | ■ | , for R and C are themselves norms, making these 
the simplest normed spaces. 

2. ► The spaces R A ' and C iV of geometric vectors have a Euclidean norm defined by 

/ N 

\; = 1 

There are other possibilities, e.g. ||a|| j := Xfci \ a i l> or ll fl lloo := max / \ a i\- Thus 



II (_ 4 ) II ! = 3+4 = 7, ||(_ 3 4 )|| 2 = V9+l6 = 5, ||(_ 3 4 )|| 00 = max(3,4) = 4. 

The different norms give the different distances already defined in Example 2.2(6). 
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3. ► A sequence of vectors x n = (a \ n , . . . , a^n) in F w converges, x„ — > a (in any 

of these norms), precisely when each coefficient converges in F, — » a, for 

i = 1,..., N. 

Proof Using the 2-norm, for any fixed i, 

I ®in Ui\~ f \0[ n | -(- • • • -{- | ClNn I = 11-^ n ^ II 2 

so when the latter diminishes to 0, so does the left-hand side. 

Conversely, if a ln —> a, for i = N, then 

ll*„ - «lb = \/\ai n - All 2 H h I a N „ - a N \ 2 0, 

by continuity of the various constituent functions. 

With minor changes, the same proof works for the other norms as well. 

4. More generally, we can define the norm ||a|| p := I a i \ p f° r P ^ 1- 

Shortly, we will see that all these norms are equivalent in finite dimensions, so 
we usually take the most convenient ones, such as p — 1, 2, oo. 

5. ► Sequences: sequences can be added and multiplied by scalars, and form a 
vector space. 

(a 0 , «!,...) + (b 0 , b\, ...):= ao + bo, a\+b \, .. .), 

A(«o, a i, . . .) := (Aao, A a \, . . .). 


The zero sequence is (0, 0, . . .) and — (ao, a \, . . .) = (—ao, —a \, . . .). 

The different norms introduced above generalize to sequences; the three most 
important normed sequence spaces are: 

(a) £ 1 := { (a n ) : l°n I < oo } with norm defined by 

OO 

ll(a«)IUi := 21 l°nl- 

n=0 


(b) £ 2 := { (a n ) : X;^=o l°« | 2 < oo } with norm defined by 


II ( a n) \\l 2 


oo 


M 


2j fl «l 2 - 

n = 0 


(c) l°° { (a n ) : 3c, |a„| ^ c} with norm defined by ||(a„)||^co := sup|a„|. 

n 

For example, for the sequence (l/n) = ( 1 , j, j , ...), 


||(l/n)||*i =oo, ||(l/n)|| € 2=7r/V6, ||(l/n)||*» = 1. 
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In each case there are two versions of the spaces, depending on whether a n e R 
or C; the scalar field is then, correspondingly, real or complex. By default, we 
take the complex spaces as standard, unless specified otherwise. 

Note carefully that an implicit assumption is being made here that adding two 
sequences in a space gives another sequence in the same space. This follows from 
the triangle inequality for the respective norm; it is left as an exercise for i 1 and 
£°°, but is proved for £ 2 in the next proposition. See Proposition 9.12 for l p . 

6. These spaces are different from each other. Not only do they contain different 
sequences, but convergence is different in each. For example, the sequences 

*1 := ( 1 , 0 , 0 ) 

x 2 := (1/2, 1/2,0,...) 

jc 3 := (1/3, 1/3, 1/3,0, ...) 

are all in i 1 , l 2 , and £°°. They converge x n — > 0 in l°° as n — > oo, 

1 

||x„ || f oo = sup{ 0 } = l/n -> 0. 
n 

(Show that they also converge to 0 in l 2 .) But they do not converge in l 1 , 

1 1 

II-*-/! Il^ 1 — — h ■ ■ • H — — 1 0. 

n n 

Thus, convergence of each coefficient is necessary, but not sufficient, for the 
convergence of x n . 

7. ► Functions A —> F, where A is an interval in R, say, also form a vector space, 
with 

(/ + g){x) ■■= f(x) + g(x), (A f)(x) \= A f(x), 

and different norms can be defined for them as well (once again, there are two 
versions of each space, depending on whether the functions are real- or complex- 
valued): 

(a) The space L l (A) := {/: A — > C, J A \f(x)\dx < oo } with norm 
defined by 

ll/llii := f \fW\dx. 

J A 

Or rather, this would be a norm, except that 1 1 / 1 1 L i = f A | / (x ) | d.r = 0 not 
when / = 0 but when / = 0 a.e. (Section 9.2). The failure of this axiom 
is not drastic, and those functions that are equal almost everywhere can be 
identified into equivalence classes to create a proper normed space, called 
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Lebesgue space (Remark 2.23(1)). But to adopt a special notation for them, 
such as [/], would be too pedantic to be useful; the symbol /, when used in 
the context of Lebesgue spaces, represents any function in its equivalence 
class. (The same comment holds for the next two spaces.) 

(b) The space L 2 (A) := {/: A — »■ C : J^|/(x)| 2 dx < oo), with norm 

defined by \\f\\ L 2 := (J^\ f (x)\ 2 dx^j . More generally there are the 
L P (A ) spaces for p ^ 1. 

(c) The space 

L°°(A) := { /: A — > C:/ is measurable AND 3c |/(x)| ^ c a.e.x }, 

with norm defined by H/H^oo := sup vae \f(x)\ (i.e., the smallest c such 
that |/(x)| ^ c a.e.x). 

(d) The space Cb(X , Y) of bounded continuous functions, defined previously 
(Theorem 6.23), is a normed space when Y is, with 

\\f\\c := sup 1 1 f (x ) 1 1 y • 
xeX 

(Check that d as defined on Cb(X , Y ) is translation-invariant and scaling- 
homogeneous.) Cb(X ) is a linear subspace of L°°(X), with the same norm. 
Note that C*(N) = £°°. 

For example, on A [0, 27t], || sin || L i = 4, || sin \\ L 2 = ~/tt, and || sin || L oo = 1. 
More details and proofs for the first three spaces can be found in Section 8.2. 

8. ► When X, Y are normed spaces over the same field, X x Y is also a normed 
space, with 

The induced metric is D\, defined previously for X x Y as metric spaces 
(Example 2.2(6)). 

9. ► Suppose a vector space has two norms || ■ || and ||| ■ ||| . Convergence with respect 
to one norm is the same as convergence with respect to the other norm when they 
are equivalent in the sense of metrics (Exercise 4.17(6)), i.e., there are positive 
constants c,d >0, 
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Proof Suppose the inequalities hold and ||x„ — x\\ — > 0, then 
|||x H — xc HI ^ d\\x n — xc || — »■ 0 

as well; similarly if |||x„ — x||| -* 0 then ||x„ — x|| ^ c _1 |||x„ — x||| -* 0. 

Conversely, suppose the ratios |||x|||/||x|| approach 0 as x varies in X. Then a 
sequence of vectors x„ can be found such that ||x„|| = 1 but III** III < 1 /n.i.e., 
x n — »■ 0 with respect to ||| • ||| but not with respect to || ■ ||. For this not to happen, 
|||x|||/||x|| f c > 0, and similarly, ||x||/|||x||| ^ 1/d > 0. 

A good strategy to adopt when tackling a question about normed spaces, is to 
try to answer it first for R or C, then C N , then for a sequence space such as l°° 
or l 1 , and finally for a function space C[0, 1], L°°[0, 1], or L 1 (R). Theoretically, 
sequence spaces are useful as model spaces that are rich enough to exhibit most 
generic properties of normed spaces. But they are also indispensable in practice: a 
real-life function /(f) is discretized, or digitized, into a sequence of numbers /, 
before it can be manipulated by algorithms. 

Let us justify the claim that l 2 is anormed space, by showing that the standard norm 

|| (a„) ||^2 := \cin\ 2 satisfies the triangle inequality, even in infinite dimensions. 

Proposition 7.4 Cauchy’s inequality 


For a n , b n e C, 

OO 

| ^ 7 bn ^ 

77=0 \ 

OO OO 

ZKl 2 Z l fc «l 2 

77=0 \ 77=0 

OO 

Z 1 On + bn\ 2 < 

\ n =° \ 

OO OO 

Zki 2 + Zi fo "i 2 

77=0 \ 77=0 


Proof (i) It is easy to show from ( a — b) 2 ^ 0 that ab ^ ( a 2 + b 2 )/ 2 for any real 
numbers a, b. Hence, 

= Z fll ' b i a j b i < Z ^ h 2 i + a 2 i b h / 2 = Z of Z h 2 !- 

" ij ij i j 


It follows that, for complex numbers ci n , b n , 

| ^ ' fifi bn 
n 


/Z i«»i 2 /Zi^i 2 

n V n V n 
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(ii) 


^~ l . Wn + b n |“ ^ 'y' \a n | 2 + |i>H + 2|fl„fc n | 




2>„| 2 + 2>„| 2 + 2 / Xl «" l 2 Zl fc " 



□ 


Thus for any two real sequences x = (a„), y — (b n ) in l 2 , one can define their 
‘dot product’ 


OO 

x ■ y := y" a n b n 

n = 0 


whose convergence is assured by Cauchy’s inequality. The identity ||jc || 2 = x ■ x, 
familiar for Euclidean spaces, remains valid for l 2 . Note that the two inequalities 
above can be written as |jc • y | ^ ||jc || ||y || and ||jc + y|| ^ ||jc || + ||y||, and that x ■ y 
need not be finite unless Jr and _y are in l 2 . 

Since the metric of a normed space is translation invariant, it is not surprising that 
balls do not change their shape when translated. 

Proposition 7.5 

All balls in a normed space have the same convex shape: 

B r (x) — x + r B\ (0), 

B r (x) + B s (y ) = B r+S (x + y), A B r (x) = fi|A|r(Ax). 


Proof The norm axioms can be recast as axioms for the shape of balls. The 
translation-invariance and scaling-homogeneity of the distance are equivalent to 

B r (x + a) — { y : d(y, x + a) < r } = { y : d{y — a, x) < r] 

— { a + z ■ d(z , x) < r } = B r (x ) + a, 

ABi(0) = {Ay : ||y|| <1} = {z: ||z|| < |A| } = S, A |(0), (A ^ 0). 

Combining the two gives B r (a) = a + r B\ (0), showing that all balls have the same 
shape as the ball of radius 1 centered at the origin. 

The third norm axiom is equivalent to C\ r> ( l 5/ (0) = {0}, while the triangle 
inequality becomes B r (0) + If (0) = B r+S (0) since 
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||x|| < r AND ||y|| <5 =► ||x + y|| < r + s, 

|| a: || < r + s =>• x = x H x € B, ( 0) + B s ( 0). 

r + s r + s 

Recasting this equation as ( r + s)B i (0) = rB\ (0) + sB i (0) for r, s ^ 0 shows that 
B\ (0), and hence all other balls, are convex: for x, y e B i (0) and 0 ^ t ^ 1, 


(1 - t)x + ty e (1 - t)Bi (0) + tB i (0) = B[ (0). 

In particular, B r (x ) + B s (y) — x + rB i (0) + y + sB\(0) = B r+S (x + y), 
A B r (x) = Xx + XrBi(0) = Z?|A| r (A.r). 


The unit ball is often denoted by Bx '■= B \ (0) and takes a central role as repre- 
sentative of all other balls; it contains all the information about the norm of X. 


Examples 7.6 

1. The boundary of a ball B r (x) is the sphere S r (x ) := { y e X : cl(x. y) = r \. 
Any point on the sphere has nearby points inside and outside the ball ((1 — e) v 
and (1 + e)y). Thus B, (x) = B r [x] \= { y e X : cl(x, y) < r }. 


2 . 


* Balls can have quite counter-intuitive properties. For example, consider the 
path of functions ft(x) := 2|x — 1 1 — 1 in C[ 0, 1], starting from the function 
/ 0 (x) = 2x — 1 and ending at the function f\ = — /q. It lies on the unit sphere 
of C[0, 1], but has a total length equal to the distance between fo and f\. 



Exercises 7.7 


1 . Prove that || • || j and || • || ^ are norms. Which axiom does || • || p fail when p < 1 ? 

2. What do the unit balls of R 2 in each norm of Example 7.3(2) look like? 

3. The sequence (1,1 1, 0, 0, . . .) is not a good approximation to the constant 

sequence (1, 1, . . .) in l°°\ but ( 1 — e, 1 — e, . . .) is. 

4. The norm axioms for £ l and £°° are, when interpreted correctly. 


Z« \°n + bn\^ X„ l«"l + X« \bn\, 

X« I Xu n | = |A|Z„ l«n U 

X« M = 0 o Vn, a„ = 0 


sup„ | a„ + b n | < sup„ |a„| + sup„ | b n \ , 
sup„ | | = |A| sup„ |u„|, 

sup„ |a„| = 0 V«, a n = 0. 


Prove these, assuming any results about series (Section 7.5). What are they for £ 2 1 
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5. A set A is bounded when there is a c > OsuchthatVx e A, ||x|| ^ c (Section6.1). 
A non-trivial normed space is not bounded. 

6. For any subset A, and e > 0, A + B e ( 0) is an open set containing A. 

7. ► The norms || ■ || j, || ■ || 2 and || ■ are all equivalent on M. N since (prove!) 

Halloo < Ik II 2 < M l < NWxlU 

But they are not equivalent for sequences or functions! Find sequences of func- 
tions that converge in L^O, 1] but not in L°°[0, 1], or vice-versa. Can sequences 
converge in i 1 but not in l°°l 

8. * Minkowski semi-norm: Let C be a convex set which is balanced, e ,9 C = C 
(V9 e M), and such that (J r>0 r C ~ ^ ■ Then 

HI x If := inf { r > 0 : x e rC } 


is a semi-norm on X. 


7.3 Metric and Vector Properties 

By construction, normed spaces are metric spaces, as well as vector spaces. We 
can apply ideas related to both, in particular open/closed sets, convergence, com- 
pleteness, continuity, connectedness, and compactness, as well as linear subspaces, 
linear independence and spanning sets, convexity, linear transformations, etc. Many 
of these notions have better characterizations in normed spaces, as the following 
propositions attest. 


Proposition 7.8 


Vector addition. 

(x, y) i->- x + y. 

X 2 - 

* V, 

scalar multiplication. 

(A, x ) i — > Ax , 

F x V - 

* V, 

and the norm 

X H* ||x||, 

V - 

■> M, 

are continuous. 





Proof Vector addition and the norm are in fact Lipschitz maps, 

llki + yi) - (x 2 + v 2 ) II < ||*i - * 2 II + II Vi - T 2 ll = llki, At) - (x 2 , .V2)llx 2 , 

|lkll - lkll| < Ik - y\\- 

Scalar multiplication is continuous: for any e > 0, take | A — n\ to be smaller than 
min(e/3(l + ||x||). 1) and ||x — y|| < min(e/3(l + |A|), 1), to get 
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|| Ax 


fxy\\ < || Ax - fix 
= I A - /i|||x 
< |A - n\\\x 


+ IImx - \iy II 
+ ImIIIx - y\\ 

+ |A|||x - y\\ + |A — /i|||x - y || 


< e. 


□ 


Corollary 7.9 


When (x„) and (y„) converge, 


lim (x„ + y n ) = lim x n + lim y n , 
n— >oo n—> oo «— >o o 

lim Ax„ = A lim x n . 

n — > oo n—*oo 

lim ||x„|| = || lim x„||. 

n — ^ oo /i — >■ oo 


Of particular importance are closed linear subspaces, because they are “closed” not 
only with respect to the algebraic operations of addition + and scalar multiplication 
A • , but also with respect to convergence — > . 

Proposition 7.10 


If M is a linear subspace of X, then so is M. 

[A] is the smallest closed linear space containing A. 


Proof (i) Let x, y e M, with sequences x n e M, y n e M converging to them, 
x„ — > x and y„ — > y (Proposition 3.4). But x„ + y„ and Ax„ both belong to M, so 


x + y = lim x n + lim y n = lim (x„ + y n ) e M, 

n — >oo n — >oo n—>oo 

Ax = A lim x n = lim (Ax„) e M. 


Thus M is closed under vector addition and scalar multiplication. In particular this 
holds when M is generated by A. 

(ii) [A]] is the smallest linear subspace containing A, and |[A]| is the smallest closed 
set containing HA]. So any closed linear subspace containing A must also contain 
HA], and its closure HA]. □ 
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Examples 7.11 

1. The following sets are closed linear subspaces of their respective spaces: 

(a) 0}, 

(b) B :={f GC[a,b]: f(a) = f(b)}. 

The proofs for closure (linearity is left as an exercise) depend on the following 
inequalities that hold when a n —*■ a in l 1 , a n £ A, and f n — > / in C[0, 1], 
fn e B, 


^ ^ , Qjn T ^ (ctj 0-in') ^ ^ , \@i Q-in \ — 11^ U n \ 


;= 0 


i= 0 


i=0 


1=0 


/(a) = lim f n {a)= lim /„(b) = f{b) 

n — > oo «— >oo 


2. * If M and V are closed subsets of a normed space, M + N need not be closed 
(see also Exercise 7.14(5)). 

(i) Let /: X — » Y be a continuous function between normed spaces; let 
M := { (x, f{xj) : x £ X], N := {(.c,0) : x £ X}; they are closed sub- 
sets of X x Y (prove!). But M + N = { (Jc, f(x )) : x, x £ X } is closed if, and 
only if, im / is closed, which need not be the case. To take a specific example. 


{ (x, 0) : x £ R } + { (x, e x ) : x £ 1 } = R x ]0, oo[. 


(ii) This is true even if M, N are linear subspaces. Let M be the set of bounded 
sequences (ai, 0, < 22 , 0, . . .) whose even terms vanish, and let N consist of 
bounded sequences of the type (a 1 , ai/1, a. 2 , fl2/2 2 , < 13 , «3/3 2 , . . .). They are both 
closed subspaces of £°° (check!). Now consider 

x„ : = (1, 1, 2, 1 3,^, 4, i . . , n, 0, 0, . . .) e N 
2 3 4 n 

y n : = (1, 0, 2, 0, 3, 0, 4, 0, ... , n, 0, 0, 0 ,...)£ M 

x„ - y n = (0, 1, 0, io,|,0,],...,0,-,0,0,...)6M + IV 
2 3 4 n 

x n — y n converges to the bounded sequence (0, 1 , 0. . . .) which cannot be 

expressed as a vector in M + N. 


Connected and Compact Subsets 

Recall that connected sets may be complicated objects in general metric spaces. This 
is still true in normed spaces, but at least for open subsets, connectedness reduces to 
path-connectedness, which is more intuitive and usually easier to prove. 
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Proposition 7.12 

An open connected set in a normed space is path-connected. 


Proof Let C be a non-empty open connected set in X. Recall that “path-connected” 
means that any two points in C can be joined by a continuous path r : [0, 1] — * C 
starting at one point and ending at the other. Fix any x e C, and let P be the subset 
of C consisting of those points that are path-connected to x. We wish to show that 
P = C. 


P has no boundary in C : Given any boundary point 
2 of P, there is a ball B, (z) c C since C is open, and 
thus a point y e P in the ball. This means that there is 
a path r from x to y. In normed spaces, it is obvious 
that balls, like all convex sets, are path-connected (by 
straight paths). So we can extend the path r to one 
that starts from x and ends at any other w e B,(z), 
simply by adjoining the straight line at the end. More 
rigorously, the function r : [0, 1] C defined by 



m 


r(2t) t e [0, i] 

y + (2t - l)(w - y) f€]j, 1] 


is continuous. So z is surrounded by points of l\ a contradiction. 

But a connected set such as C, cannot contain a subset, such as P, without 
a boundary (Proposition 5.3), unless P — 0 (which is not the case here) or 
P = C. □ 

There is quite a bit to say about bounded and totally bounded sets. As we will 
see later on, they are the same in finite dimensional normed spaces, but in infinite 
dimensional ones, no open set can be totally bounded, although balls are bounded sets. 
For now, let us show that translations and scalings of bounded and totally bounded 
sets remain so. 

Proposition 7.13 

If A, B are both bounded, totally bounded, or compact sets, then so are, 
respectively, A A and A + B. 


Proof Proposition 7.5 is used throughout the following. 
Boundedness : If A C B, (x) and B C B s (y) , then 

A A c A B r (x) = B|>,| r (Ax), 

A + B C B r {x) + B s (y) — B r+S fx + y). 
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Total boundedness: 


N N 

AA c A |J B e /\\\{xi) = y B e (Xxi), 

i=i i=i 

N M 

A + S^y B e / 2 {xi) + y B e / 2 (yj ) = y B e (xi + yj). 

i= 1 j = 1 ij 

Compactness: If A is compact, then scalar multiplication, being continuous, sends 
it to the compact set A A (Proposition 6.15). If B is also compact, then A x B is 
compact (Exercise 6.22(16)), and vector addition, being continuous, maps it to the 
compact set A + B. □ 

Exercises 7.14 

1. Show that the following sets are closed subspaces of their respective spaces: 

(a) { (a,) e i°° : a 0 = 0}, 

(b) { (a,) e l 2 : a, = a 3 AND a 0 = £“i a,// }, 

(c) { / e C[0, 1] : /q 1 / = 0 }. 

2. The set of polynomials in x forms a linear subspace of C[0, 1], Its dimension is 
infinite because the elements 1, x, x 2 , . . . are linearly independent. Is it closed, 
or if not, what could be the closure of the polynomials in this space? 

3. The convex hull of a closed set need not be closed; a counterexample is given by, 
(lx { 0 }) U { (0, 1) }. But the closure of a convex set C is convex. 

4. Line segments are path-connected; so linear subspaces and convex subsets (such 
as balls) are connected. 

5. The continuity of + and A ■ imply that A A = A A and A + B C A + B. Find an 
example to show that equality need not necessarily hold. 


7.4 Complete and Separable Normed Vector Spaces 
Definition 7.15 


When the induced metric d(x, y) :— \\x — y|| is complete, the normed space 
is called a Banach space. 
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Stefan Banach (1892-1945) After WW1, at 24 years, a chance 
event led him to meet Steinhaus, who had studied under Hilbert 
in 1911, and was then at Krakow university. His 1920 the- 
sis on abstract normed real vector spaces earned him a post 
at the University of Lwow; working mostly in the “Scottish 
cafe” , he continued research on “linear operations” , where he 
introduced weak convergence and proved various theorems such 
as the Hahn-Banach, Banach-Steinhaus, Banach- Alaoglu, his 
fixed-point theorem, and the Banach-Tarski paradox. 


Fig. 7.1 Banach 


Examples 7.16 

1. ► R iV and C iV are separable Banach spaces. It is later shown (Section 9.1.4) that 
the sequence spaces l p and the Lebesgue function spaces L p [ 0, 1] ( 1 R /; < 
oo) are also separable Banach spaces, but £°° is a non-separable Banach space 
(Theorem 9.1). 


2. A closed linear subspace of a Banach space is itself a Banach space 
(Proposition 4.7). 


3. ► When X, Y are Banach spaces over the same field, so is X x Y (Proposition 4.7). 

4. Ch(X , Y) is a Banach space whenever Y is (Theorem 6.23). 

5. Not every normed space is complete (when infinite dimensional). 

(i) The set coo of finite sequences (ciq, . . . , a„, 0, 0, . . .), n e N, is an 
incomplete linear subspace of £°°. For example, the vectors (1,0,0,...), 
(1, 5, 0, 0, . . .), ..., (1, j, ■ . . , A 0, 0, . . .), ..., form a Cauchy sequence which 
does not converge in coo- 


(ii) Take the vector space of continuous functions C[— 1, 1] with the 1-norm 
rl 

||/|| := \f{x)\ dx. This is indeed a norm but it is not complete on that space. 

For consider the sequence of continuous functions defined by 


fn(x ) := 


0 

nx 

1 


— 1 < x < 0 

0 ^ x ^ 1 / n 
\/n < x ^ 1 


It is Cauchy: 



II fn fm 



fm I — 


1 I 1 

2 n 


1 , 

— — >■ 0, as n , in — »■ oo 
m 


• rl 

but were it to converge to some / e C[— 1, 1], i.e., \f n (x) — f(x)\ d.r — »• 0, 

then 
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\f(x)\dx = 0 = 



- fix) |dx, 


so that fix) — 0 on [— 1, 0[ and fix) = lon]0, 1], implying it is discontinuous. 
Similarly the set C[a, b ] is not closed as a linear subspace of L 2 [a, b]. 

Proposition 7.17 

Every normed space can be completed to a Banach space. 

Proof Let X be the completion of the normed space X (Theorem 4.6). We need to 
prove that vector addition, scalar multiplication and the norm on X can be extended 
to X. Using the notation of Theorem 4.6, let x = [x n ], y = [ v„] be elements of X, 
with ( x n ), (y„) Cauchy sequences in X. Since 


ll^-n 4“ yn Vm II ^ 11-^n T m II 4“ II Ah Am II ^ 0 

II Ax^ A.V m || = | A| ||x„ X m || > 0 

| II-*-h II ll-^m II | ^ II *n %m II ^ 0? 

as n, m —> oo, we find that (x„ + y„), ( Ax„ ) and (||x„||) are all Cauchy sequences. 
For the same reasons, if (x' n ) is asymptotic to ( x n ), and (y ' n ) to ( y n ), then ( x' n + y' n ) 
and (x„ + y n ), (Ax') and ( Xx n ), and \\x'„ || and ||x„||, are asymptotic to each other, 
respectively. So we can define 


x + y := [ x n + v n ], Ax := [Ax„], ||x|| := lim ||x„||. 

11— >0 O 

Note that d(x , y) = ||x — y||. It is easy to check that they give a legitimate vector 
addition, scalar multiplication and a norm; the required axioms follow from the same 
properties in X and the continuity of these operations, e.g. 

||x + y||= lim ||x„ 4-yn|| < lim (||x„|| 4- ||y„||) = ||x|| + ||y||, 

n — > oo n — ^oo 

|| jc || = 0 =A IIxhII —*■ 0 =A x = [x n ] — [0] = 0. 

Note that the zero can be represented by the Cauchy sequence (0), and — x by 
(— x„). Furthermore, recall that there is a copy of X in X (as constant sequences); 
the operations just defined on X reduce to the given operations on X, when restricted 
to it. □ 

Proposition 7.18 

A normed space X is separable if, and only if, there is a countable subset 
A such that X = |] A], 
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Proof If X = A, such as when X is separable, then X = AC HAJ C X. 

Conversely, suppose X = [,4]] with A countable; this means that for any vector 
x, there is a linear combination of a, e A ( a,- f 0), such that 

|| Ai«i + X„a„ — x\\ < e A,-eKorC. (7.1) 

H A]] is not countable (unless A C { 0 |), but the set of (finite) linear combinations of 
vectors in A using coefficients from Q + iQ is countable (why? hint: |J„(Q 2 )" is 
countable). Choosing r,- = p, + iq ,■ e Q + iQ, such that | r,- — A,-| < e/n||a;||, and 
combining with (7.1), we get 

llnat H h r n a n - x|| < |n - Ai | ||«i || H b \r n - A„|||a„|| + e < 2e. 

This shows that X is separable. □ 


7.5 Series 


Sequences and convergence play a big role in metric spaces. Normed spaces allow 
sequences to be combined with summation, thereby obtaining series x\ + • • • + x n . 

Definition 7.19 


A series x n is a sequence of vectors in a normed space obtained by addition, 
(xi, xi + X2, x\ + X2 + X3, . . .); the N th term of the sequence is denoted by 
N N 

x n . Therefore, a series converges when || x — x„ || — >■ 0 for some x e X 

n = 1 n = 1 

as N — > oo; in this case the limit x is called its sum 

oo N 


Xi + X2 H = V := lim Yx,= 

^ * 


12=1 


12 = 1 


A series is said to converge absolutely when ||x„ || converges in K. 


Examples 7.20 

1. We can convert some results about convergence of sequences to series: 

(a) X« (•*« + yn) = Hn x n + Z» A« when the latter converge; similarly, 

Ax„ = Xj^ n x n . 

(b) A series is Cauchy when x„ + • • • + x m — ► 0 as n, m — > oo. 
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oo 

2. If a series converges both normally and absolutely, then || ^ x n 

n= 1 


oo 

^ II X n 
n= 1 


Proo/Take the limit of || jci + ■ • • + x„ || ^ ||xi || + ■ • • + ||x„ || as n — »■ oo. 


3 . There are series that converge but not absolutely. As an example, take any decreas- 
ing sequence of positive real numbers a n —*■ 0, then ]T n (— l)"a„ converges in R 
(Leibniz); yet ]T (I a n may diverge. 

Indeed, when ]T ;i a n = oc and 0 ^ a n 0, the series ±a n can converge to 
any a e M by a judicious choice of signs. Take enough terms a„ to just exceed a, 
then reverse sign to lower the sum to just less than a, then reverse sign again and 
continue. 

4. A rearrangement of a series need not converge; even if it does, it need not have 
the same sum. For example. 


l_i.I_I.I_ 

2 ~ 3 4^5 

J- I+I + I + i-L. 

1 _ 1 _ 1 1 _ 1 _ 



+ 



log 2, 

00, 

5 log 2, 

1. 


5. The sum of a ‘sequence’ (x„)„ e z can also be given a meaning: 


OO 

Z x « = Z *« := Z*-' 1 + Z' 


neZ 


n= 1 


n = 0 


when the latter two series converge. 

In general, absolute convergence does not imply, nor is it implied by, convergence 
of 'Y^ n x n . But for Banach spaces, one implication holds: 

Proposition 7.21 

A normed space X is complete if, and only if, any absolutely convergent 
series in X converges. 

Proof Let X be a Banach space, and suppose that ||x„|| converges. Let y,y := 
Zn =0 x » ’ so l ^ at for M > N 

M M 

WyM ~ yivll = II Z *■!< Z asN,M^oo. 

n=N-\-\ n=N+\ 

Hence (y#) is a Cauchy sequence in the complete space X, and so converges. 
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Conversely, let X be a normed space for which every absolutely convergent series 
converges. Let ( x n ) be a Cauchy sequence in X, so that for n, m ^ N r large enough, 
\\x n — x m \\ < e. Letting e := l/2 r , r — 1, 2, . . ., we can find ever larger numbers n r 
such that || x nr+l — x„ r || < 1 /2 r . Thus, 


oo 



r=l 


€ 


oo . 

V — = L 
^ 2 r 


r= 1 


By assumption, since its absolute series converges, so does Zr( x « r +i — x n r ), i.e., 


Xn i r — (Xn i X„2 ) + (-V W2 -L13 ) + • • • + ( x n r _|_ 1 x n r ) 


converges as /- — oo. This forces the subsequence x„ r to converge, and so must the 
parent Cauchy sequence (x n ) (Proposition 4.2). □ 

Series can be used to extend the idea of a basis as follows: a fixed list of unit vectors 
e n is called a Schauder basis when for any x e X there are unique coefficients a„ 
such that 


oo 

x = ^ \ ol u e n . 

n = 1 


This implies that X = l[e \ , 62 , ■■ ■ J, and by necessity X must be separable (though 
not every separable space has a Schauder basis [41]). Since a vector x = a n e n 
is identified by its sequence of coefficients (a„) with respect to a Schauder basis, the 
space X is essentially a sequence space (with norm || (a„) || := || a n e n || ^ ). There 

are cases where a permutation of a Schauder basis does not remain a basis; if it does, 
the basis is termed unconditional ; again, not every space has an unconditional basis 
(e.g.LH^andCtO, 1]). 


Convergence Tests 

Real series are easier to handle than series of vectors, and a number of tests for 
absolute convergence have been devised: 

Comparison Test. If ll x «ll < II yn II then Y.n = 0 \\xn II < Z,f=0 II yn II ■ If the lat- 
ter converges to ZZo IIjnlK then ll x nll is increasing and bounded above, so 
converges. 

An important special case is comparison with the geometric series, ||x„|| ^ r n 
with r < 1, because 1 + r + r 2 + • ■ • = 1/(1 — r ). This leads to: 

Root Test. Let r := limsup,, || || 
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Fig. 7.2 Schauder 


Juliusz Schauder (1899-1943), after fighting in WWI, gradu- 
ated at 24 years from the University of Lwow under Steinhaus 
with a dissertation on statistics. He continued researching in 
the Banach/ Steinhaus school, giving the theory of compact op- 
erators its modern shape; he proved that the adjoint of a com- 
pact operator is compact, the Schauder fixed point theorem, 
and generalized aspects of orthonormal bases to Banach spaces; 
later he specialized to partial differential equations. Along with 
many other Polish academics, he was killed by the Nazis during 
WWII. 


(a) if r < 1 then the series x„ is absolutely convergent, 

(b) if r = 1 then the series may or may not converge, 

(c) if r > 1 then the series diverges. 

Proof (a) \\x n || ^ (r + e) n except for finitely many terms. Since the right-hand side is 
a convergent geometric series when r < 1 and e is taken small enough, the left-hand 
side series also converges by comparison. 

(b) The series i = oo and ]T /( J, < 2 both have r = 1 . 

(c) When r > 1, ||x„|| f (\ + e) n > 1 for infinitely many terms, so the series 
\\x n \\ cannot possibly converge. 

Ratio Test. (D’Alembert’s) If the ratios ||x,j+i ||/||x„ || r then ||x„ || 1 '" — »■ r; it 
is often easier to find the first limit, if it exists, than the second. 

Proof The idea is that for large n, ||x„|| & r||x„_i|| & r' ! ||xo||, so Hx^H 1 '" ss r. 
More precisely, for n ^ N large enough, 

r - e < ||x„||/||x„_i|| < r + e, 

.-. (r - e)” _JV ||x;v|| < 1 1 x,, || < (r + e)' ? ~' v ||xAr||, 
r-2e < ||x„ || < r + 2e, 

since (r ±e)- N l n \\x N \\ x l n -> 1. 

Cauchy’s Test. If ||x„ || is decreasing, then ||x„|| converges <S> ]>] ); 2" 1 1 X2« 1 1 
converges. 

Proof Let r n := ||x„ || ; the test follows from two comparisons, 


r i + r 2 + • • • + r 2 n+i_ j — r[ + (r 2 + ^3) + • • • + (r 2 n + • • • + r2n+i_]) 

^ r\ + 2r 2 4 h 2 "r 2 n. 

r ] 4- 2r 2 + 4r 4 -1- • • • -f- 2 ,I r 2 « ^ r\ + 2 r 2 + 2(^3 -1- rq) + ■ ■ ■ + 2(r 2 n-i_j_i + • • ■ -f- r 2 n) 

^ 2(ri + r 2 -t t- r 2 n). 
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Rummer’s Test. Let ]T ;| y- be a divergent series of positive terms and 


r n+\ ll-^/i+l 
r n II %n II 


1 - — + o(l /r n )- 
r„ 


If a > 0, then the series X ;i x n converges absolutely, otherwise when a < 0 the 
series diverges. For example, r„ 1 gives the ratio test, r n := n is Gauss’s or 
Raabe’s test, and r n := n logn is Bertrand’s test. 

Proof When a > 0, we are given that c||x„|| ^ r ;i ||x n || — r„+i ||x„+i || for n ^ N 
large enough, and some 0 < c < a. Summing up these inequalities results in 


cfllxjvll H H ll-Xmll) ^ CvliMvIl - r m +i||x m+ i|| < rjv||xAr|| 


so the series converges as it is increasing but bounded above. 

When a < 0, we have r n ||x„ || < r n+ \ Hx,,-)-! || for n ^ N large enough. Hence 

■ I „ '•jvIIxjvII 

Iknll > 

r„ 

and the series diverges by comparison with the series X,, — . 

There are other tests, for example, Cauchy’s inequality shows that X„ a n b n con- 
verges when X„ op and X„ do. 

Exercises 7.22 

1. If a series X„ x n converges, then x„ — > 0. The converse is false: 

111 1 

1 H 1 1 1 1 > oo. 

2 3 4 n 


More generally, for any fixed A:, Xiv+XAr+iH hx^+i — > 0 and X,Xiv x n —*■ 0, 

as N oo. 


2. If Xn llxnm — x, 2 1| — > 0 as m —>■ oo, then lim / x nm = / lim x nm 

m— >oo o 

n n 

^ x„ , if the latter converges. 

n 


3. From the geometric series, it follows that 1 — a + a 2 — a 3 + • • ■ and X H a '" 
(r„ ^ n) converge for |a| < 1 in M. 

4. The series 1 + 4r + yi + ■ ■ • , X« %■> anc ^ X« Pi converge by comparison with 
a geometric series (or using the ratio test). 


11 7 r 2 

5. 1 H — y H — =• + ••• = — . This series was too hard to sum before Euler; show 
2 2 3- 6 

at least that it converges, using the comparison \ f _ I 
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Generalize this to the case (—[jp-i ~ ^ t0 sh° w that Jp converges 

for p > 1. Deduce that ]T n converges, by comparison. 

6 . These last series are examples that converge slower than the geometric series; in 
fact they are not decided by the root and ratio tests. Are there series that converge 
even slower? 

7. The Cauchy or Raabe tests can also be used to show that 1 + j? + jp H — ■ con- 
verges only when p> 1. Show further that £„ ^ „iog»iogiog„ > 

..., diverge. 

8 . The Weierstrafi M-test (comparison test for L °° ): if || f n II l°° ^ where ]T ); M„ 

converges, then /„ converges in L°°(A ) (i.e., uniformly). Use it to show that 
the function /( x) := X«^=i converges uniformly on [—1, 1], 

9. Let f n (x) := e~ nx /n, then ||/„|| L i [(U] < 1 /« 2 , and so f n converges in 
L‘[ 0 , 1 ]. 

10. What is wrong with this argument: When ||x„|| 1 ' / " — »• 1, then ||x„|| > (1 — e)" 
for infinitely many terms; the right-hand side sums to 1/e, which is arbitrarily 
large; hence the series cannot converge absolutely. 

11. A rearrangement of an absolutely convergent series also converges, to the same 
sum. (Hint: Eventually, the rearranged series will contain the first N terms.) 

12. Suppose a series x\ + + • • • is split up into two subseries, say x\ + X 4 + • • • 

and X 2 + X 3 + • • • , denoted by x„, and ^ , x n > . If they both converge, to 
x and y respectively, then the original series x„ also converges, to x + y. 
If one converges, and the other diverges, then the series ]T n x„ diverges. But it 
is possible for two subseries to diverge, yet the original series to converge; for 
example, 1 — i + i — | + • • • — ► log 2 . 

13. Cesaro limit : A sequence (x„ ) is said to converge in the sense of Cesaro when 
xi+ " n +x " converges. Show that if a = limn-^oo x„ exists then the Cesaro limit is 
also a. Show that the divergent sequence (—1)" is Cesaro convergent to 0. 

Remarks 7.23 

1. Weighted spaces are defined similarly to l p and L p but with a different mea- 

sure or weight. For example, an space with weights w„ > 0 consists of 
sequences with bounded norms || jc || := \x n \w n . Similarly, (A) has norm 

( f \f(x)\ 2 w(x) dx) 2 . In fact, weighted spaces are isomorphic to the unweighted 
spaces; for example l ^ = l 1 via the map (x„) i->- (w„x„). 

2. The second norm axiom requires that the field be normed. A famous theorem by 
Frobenius states that the only normed fields over the reals are R and C. 
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3. Cauchy’s inequality was known to Lagrange in the form 



N N N 2 N 


4. Hausdorff’s Maximality Principle =>■ Axiom of Choice. 

Proof Let A — { A a C. X \ a e I }be a collection of non-empty subsets of a set X. 
Consider pairs ( / , g) where / is a subset of I and g is an associated choice function 
g: J -> X, i.e., g(a) e A a for all a e /. To prove the axiom of choice we need to 
show that there is a choice function / with domain I. 

Let (/, g) C (/, g) mean that / C / and g extends g, i.e., g(a) = g(a) whenever 
a e /.By Hausdorff’s maximality principle, there is a maximal chain of nested sets 
and their choice functions (/, ,gf). The union J := (J ; /, also has a choice function, 
namely f (a) '■= gi (a) whenever a e /, , and it is the one sought for: 

/ is well-defined: If a belongs to more than one index set, say /, and J j, then 
without loss of generality, J j C /,- and g,- extends gj, say, so gi(a) = gj(a). 

/is a choice function on/: If a e / then a e Jj for some/, so /(a) = gt (a) e A,,. 
J = I: Otherwise there is some index ;3 e / x / , and an element xp e Ag\ f can 


/(a) a e J 
xp a = P 


be extended further to / defined by /(a) := 


. Then / is a choice 


function on its domain / U {/3}, and extends every choice function g,- : /, — > X in 


the maximal chain, a contradiction. 


□ 


Chapter 8 

Continuous Linear Maps 


8.1 Operators 

In every branch of mathematics which concerns itself with sets having some partic- 
ular structure, the functions which preserve that structure, called morphisms, feature 
prominently. Such maps allow us to transfer equations from one space to another, to 
compare them with each other and state when two spaces are essentially the same, 
or if not, whether one can be embedded in the other, etc. Even in applications, it is 
often the case that certain aspects of a process are conserved. For example, a rotation 
of geometric space yields essentially the same space. The morphisms on normed 
spaces are formalized by the following definition. 

Definition 8.1 


An operator 1 is a continuous linear transformation T : X — > Y between 
normed spaces (over the same field), that is, it preserves vector addition, scalar 
multiplication, and convergence, 

T(x + y) = Tx + Ty, T (kx) = XTx, T{ lim x n ) = lim T x n . 

n — MX) n— >oo 

A functional is a continuous linear map cp : X — * F from a normed space to 
its field. The set of operators from X to Y is denoted by B(X . Y), and the set 
of functionals, denoted by X*, is called the dual space of X. 


1 The use of the term operator is not standardized: it may simply mean a linear transformation, 
or even just a function, especially outside Functional Analysis. But it is standard to write Tx 
instead of T(x). 


J. Muscat, Functional Analysis, DOI: 10. 1007/978-3-3 19-06728-5_8, 
© Springer International Publishing Switzerland 2014 
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Easy Consequences 

1 . T 0 = 0 . 

2 - = 

3. A linear map is determined by the values it takes on the unit sphere. 

A simple test for continuity of a linear transformation is the following Lipschitz 
or “bounded” property. 

Proposition 8.2 


A linear transformation T : A — >■ Y is continuous if, and only if, T is a 
Lipschitz map 

3c > 0, Vx e A , || 7’x || y ^ c||x|| x . 


Proof The definition of a Lipschitz map reads, when applied for normed spaces, 
||/(x) — /(y)|| ^ c||x — y|| for some c > 0. When / is in fact a linear map T, it 
becomes ||T(x — y)|| ^ c||x — y||, or equivalently, ||7’a|| ^ c||a|| foralla e A. That 
Lipschitz maps are (uniformly) continuous is true in every metric space (Examples 
4.15(3)), but can easily be seen in this context. If x„ -> x, then Tx n -> 7’x, since 

|| Tx„ — 7’x || = || T (x n — x) || ^ c||x„ — x || — » 0. 

Conversely, suppose the ratios || 7’x || /||x || are unbounded. Since scaling x does not 
affect this ratio (because T is linear), there must be vectors x n such that || Tx„ || = 1 
but 1 1 Xfi 1 1 ^ 1 /n. So x n —*■ 0 yet 7’x,, -f? 0, and T is not continuous. □ 

Proposition 8.3 


NT : X -> Y is an operator, 

(i) the image of a linear subspace A of A is a linear subspace T A := 

{ 7’x : x e A } of Y, 

(ii) the pre-image of a closed linear subspace B of Y is a closed linear 
subspace T~ l B := {x e X : Tx e B } of A. 

The image and pre-image of convex subsets are convex. 


In particular, its image im T := TX is a linear subspace; and its kernel 
ker T T~ l 0 is a closed linear subspace. 
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Proof ( i) Let Tx,Ty £ T A, then Tx + Ty = T(x + y) £ T A, and 
XT x — T(Xx) £ T A. 

(ii) Let x, y , x n £ T~ l B , that is, Tx, Ty, Tx n £ B, and let leF. Then 


T(x + y) = Tx + Ty £ B , T (Ax) = XT x £ B, 

x n -> a =A Ta = T( lim x n ) = lim Tx n £ B, 

n—>o o n — >■ oo 


show that T 1 B is a closed linear subspace. 

(iii) Let Tx, Ty £ T A, where A C X is a convex subset. Then for any 0 f t f 1, 
z := tx + (1 — t)y is in A, so 

t Tx + (1 — t)Ty — T(tx + (1 — t)y) = Tz e T A 

shows T A is also convex. Now let B C Y be convex, and let x, y £ T~ l B, i.e., 
a := Tx, b Ty are both in B. Then, by convexity of B, 

T (tx + (1 — t)y) — ta + (1 — t)b £ B 

and tx + (1 — t)y £ T~ l B as required. □ 


Examples 8.4 

1. An operator T maps the linear subspace [[AJ to HT AJ because 

n n 

x = ^^a 1 a ; =>■ Tx — ^ T a, . 
i = 1 i=\ 


In particular it maps a straight line to another straight line (or to the origin), 
hence the name “linear” applied to operators. 

2. ► A linear transformation from C N to C M takes the form of a matrix. Letting 

/ a l 

C N = leu . . . , eyvl, C M = le\ e' M l, x = a,-e/ = | 


Te i = Z j T ji e 'j, then 

N N M 

Tx = y^ajT e, = y oti ^ Tji e'j = 
i'=l 1=1 ;'=1 


( T\\ ... Tiff 
\ T m 1 . . . T mn 



, and 


Every matrix is continuous, 

M N [ M 

117*112 < < Zl^'l ) ^11*112 (Exercise 7.7(7)). 
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3 . A functional from C N to F is then a 1 x N matrix, otherwise known as a row 
vector. 


<t> 




N N 

X ',4>(e„)a n = y, b n a n = ( b\ ...b N ) 

n = 1 n = 1 



4 . Generalizing this to functionals on complex sequences, let y T (x) := y ■ x := 
y, b n a n , where x = ( a n ) and _y = ( b n ). Then y 1 is linear, 


y ■ (x + x’) = ^ b n (a n + a’ n ) = ^ b„a„ + ^ b n a’ n = y ■ x + y ■ x', 

n n n 

y ■ (Xx) — ^ b„Xa„ — X ^ b n a n — Xy ■ x, 

n n 

but may or may not be continuous, depending on y and the normed spaces 
involved. For example, to show that 4>(ci n ) := ( ~ ), 2 a " defined on £°° is 

continuous, note 


10*1 = | X 

n 


(-D” 

n A 



sup | a n I < 2\\x\\t°°- 

n 


5 . When X has a Schauder basis (e„), a functional must be in the form of a series: 
4>x = (p ( y, a n e n ) = = y b„a n , (b„ := fe n ,a„ e F). 

n n n 


6 . The identity operator I : X —>■ X, x i-> x, is trivially linear and continuous. 
Similarly for scalar multiplication, X : x 1— >• Xx. 

7 . ► The left-shift operator L : i 1 -> £ l defined by ( a n ) i-> (a, !+ i), i.e., 


L(ao, a 1, 02, ...):= (a\, 02, 03, ■ ■ .), 

is onto, linear, continuous, and satisfies ||Z.jc|| ^ ||;c||;itskernelisspannedbyeo- 
Proof That L is onto is obvious; linearity and continuity follow from 


L(a n + b, ,) — (a 1 + b\, 02 + b2 , . . .) = (a\, 02 , . . .) + {b\,b2, ■ ■ •) 
— I ■ (o n ) T L(b n f 

L(Xa n ) = ( Xai , Xa2 , . . .) = X(a \ , 02, • • •) = A.L(a„), 

00 00 

II^IUi = ^ = IWU 1 
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For x e ker L, (a\, 02 , . . .) = Lx = 0 , so x — (ao, 0, 0, . . .) = ao^o; in fact 
Le 0 = 0 . 

8. ► In general, the multiplication of sequences jc i-»- yx, defined by (b n )(a n ) := 
(b„a„), is linearon the vector spaceof sequences. When \ b„ f c, it is continuous 
as a map i p — > l p (p f 1); e.g. for p = 1, 

00 00 

Uplift = X! \ b r a n\ ^ c '^ J \ a n\ = c||*ll^l- 

n= 0 n= 0 


In hnite dimensions, this is equivalent to multiplying x hv a diagonal matrix. 

9. Solving linear equations Tx = b, where T and b are given, is probably the single 
most useful application in the whole of mathematics . The complete set of solutions 
isxo + ker 7’, whcrexo is any individual or/tflrr/cH/ar solution Txo — fe.andker T 
is the set ot'solutions of the/trtmo^eneoM.v equation T x = 0 (since T (x — xq) = 0). 

10. The kernel subspace of a functional ker cp is called a hyperplane. 


Integral Operators 

We now consider a broad class of operators that act on spaces of functions. An 
integral operator (or transform ) is a mapping on functions 


Tf(y):= f k(x, y)f(x)dx, 
J A 


where k is called the kernel of T (not to be confused 
with ker T). To motivate this definition, suppose T 
is a linear operator that inputs a function / : A C 
R — > C and outputs a function g~.BC. R — ► C. 
If A and B are partitioned into small subintervals, 
the functions / and g are discretized into vectors 
( fj) and (gi), and the linear operator T becomes 
approximately some matrix [ fj ] . As the partitions 
are refined, one might hope that 7), would converge 
to some function k(x, y) on A x B . and the finite 
sums involved in the matrix multiplication 
become integrals f A k(x, y)f(x) dx. (This is not 
necessarily the case, as the identity map attests.) 
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Proposition 8.5 


An integral operator Tf(y) := f A k(x, y)f(x) dx is linear, and is continu- 
ous on the following spaces: 

sup | k(x, y)| < oo => T : L { (A) — >■ L°°(B), 

xeA,yeB 

/ sup | k(x, y)| dy < oo =>■ T : L l (A) — >■ L l (B), 

J B xeA 

sup / I k(x, y) | dx < oo T : L°°(A) L°°(B), 

yeB J A 

[ [ | k(x, y)| d.v d y < oo =>■ T : L°°(A ) -» L\B). 


Proof Linearity follows easily from 


/ k(x, y)(Xf (x) + g(x)) dx — X / k(x, y)f (x) dx + / k(x, y)g(x) dx . 
I A ' J A JA 


(i) Continuity on L l (A) -* L°°(B ): 


\\ T f\\L°°(B) < sup / |A:(.y,y)/(.r)|d.y < sup|A:(x,y)| / |/(x)|dx. 
yeB J A x,y JA 


(ii) Continuity on L l (A) — >■ L l {B): 


\\Tf\\ L i( B ) = [ [ k(x,y)f(x)dx dy f [ sup \k(x , y)\ dy [ \f(x)\dx. 

J B J A J B xeA J A 


(iii) Continuity on L°°{A) — » L°°(B ): 


L oo (B) < sup / |A'(x,y)/(x)|dx < sup / \k(x, y)\dx \\f\\ L oc {A) 
yeB J A yeB J A 


(iv) Continuity on L°°(A) — > 


l|r/|| L i (B) < / / |fc(x,y)||/(x)|dx< / |*(jc, y)| dx dy ||/|| L oo (A) n 




8.1 Operators 


121 


Examples 8.6 

1. Integration, / i-> f A /, is a functional on L l (A). 

2. The Volterra operator on L [ [0, 1] is V f(y) J 0 ' /. It is an integral operator 

1 x ^ y 
0 y < x 

3. ► The Fourier transform of a function / e L 1 (R) is defined to be the function 


with kix, y ) := 


/ OO 

e~ w f(x) dx. 

-OO 


It is an operator T : L 1 (R) — >■ L°°(R). 

4. For integral operators S, T, with kernels ks, kj respectively, 

(a) S — T only when k$ — kr a.e., (since for all f,(S — T)f = f (k$(x, y) — 
kr(x, y))f (y) dy = 0); 

(b) S + T has kernel k$ + kr, and XT has kernel Xkj, 

(c) ST has kernel ksr (x, z) '■= f ks(y, z)kr(x, y) dy. 

The kernel acts like a “matrix” with real-valued indices, k x , y in place of Aj j . The 
properties listed here are analogous to those of the addition and multiplication of 
matrices. 

5. Which integral operators on L 1 (R) are translation- invariant, meaning TT a f = 
T a Tf, where T a f(x) = fix — a)? The requirement is, for all f e Z^fR), 


J k(x, y)f(x — a) d.v 


k(x, y — a) fix) d.v. 


By changing the r-variable in the left-hand integral to i = x — a, we obtain 
kix + a, y) = kix,y — a) a.e., as / is arbitrary. Equivalently, kix,y ) = 
kix — y, 0) =: kix — y) a.e.(r, y) for some function k e L l (R). That is, 

Tf = k*f := j kix - y)f (y) dy 

called the convolution of k with /. 

6. An example of a functional that is not integral is given by 8 xo if) := fix o), acting 
on CiX), where xq e X. 

Proof Linearity is immediate, e.g. 8 X0 if + g) = if + g) (x 0 ) = fix 0 ) + gix 0 ) = 
S xo (f) + 8 xo ig). For continuity, 

|5xo/l = l/(^o)Ksup|/W| = ||/|| c(Z) . 
xeX 
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Vito Volterra (1860-1940) studied hydrodynamics at Pisa under 
Betti (1883); this led him over the next ten years to consider 
integral equations of the type f(x) — f k(x,y)f(y) d y = g(x ), 
which he showed can be solved by iteration. He applied such 
“functionals” to the theory of optics and distortions, Hamilton- 
Jacobi dynamics, elasticity and electro- magnetism. He moved 
from one professorship in Turin to another in Rome, becoming 
a senator in 1905, and finding the time to write his Volterra 
equations about the numbers of predators and prey in mathe- 
matical biology, until in 1931 he preferred exile to the reign of 
Mussolini. 

Fig. 8.1 Volterra 



7. Differentiation of functions is linear (say on the vector space of differentiable 
functions) but it is not continuous in the oo-norm, e.g. 

||Dcos(n*)|| C(R) = || -n sin(nx)|| c(R) = n 

whereas || cos(«x)|| C(R) = 1. Similarly, || Z)a / 2 || c[ 0 . l ] / 11-^” II C[0, l] oo as 

n — >• oo. (Note: here, x n and cos (nx) denote functions.) 

Theorem 8.7 


B(X, Y) is a vector space with a norm defined by 


7 


|| T x\\y 

sup = sup ||7x||y. 

X^o 11*11* M=l 


B(X, Y) is complete when Y is complete. In particular, X* is a Banach 
space, with norm 


11011 


10*1 


SUP ^' 
jrytQ 11*11 


Proof The norm is well-defined in the sense that if T is an operator, then ||7x||/||x|| ^ 
c for all non-zero x e X, and the supremum ||r|| of such upperbounds c exists. In 
fact, a linear map belongs to B(X, Y) if, and only if, 1 7'| < oo, in which case 

|| 7x11 ^ ||7||||x||. 

This inequality is used extensively in the rest of the text. 

Addition and scalar multiplication of operators is defined by 


( S + T)x := Sx + Tx, (XT)x := X Tx. 
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That B(X , Y ) with these operations is a vector space is a straightforward calculation, 
using the linearity and continuity of these operations in X and Y (Proposition 7.8). 
For example, 

(A ,T){x + y) = A T(x + y ) = XT x + A Ty = (A T)x + (A T)y. 

More crucially, 

sup ||Sx + Tx || < sup (||Sx|| + IITxll) 

Ik 11=1 ||.r|| = l 

< sup ||5x|| + sup ||Tx|| 
lkll=i Ik 11=1 

= \\S\\ + \\T\\ 

sup ||A7x|| = sup |A|||7x|| = |A|||r|| 

Ik ll=i Ik ll=i 

0 Vx || Tx || = 0 T = 0. 

B(X . Y) is complete if Y is : Let T n be a Cauchy sequence of operators in B(X , Y), 
that is, || T n — T m || — > 0 as n, m oo. Then, for each x e X, 


S+T 


UT\\ = 
1171 = 


\T„x - T m x || < || T n - T m 


implies that (T n x) is a Cauchy sequence in Y, so that T n x converges to some vector 
which can be denoted by T (x), if Y is complete. We now show that T is linear: 

T n (x + y) = T n x + T n y, T n (Ax) = A T n x, 

( | 1 J as n -> oo 

T (x + y) Tx + Ty, T (Ax) A Tx, 

by continuity of addition and scalar multiplication. 

Finally, for any e > 0 and any x e X, 

||(7; — 7’)x|| ^ || 7; — T m || ||x || + || T m x — Tx || < e||x|| + e||x||, 


where m is chosen large enough, depending on x, to make || T m x — 7’x || < e||x||, 
and n, m ^ N large enough to make \\T n — T m \\ < e. Flence ||r„ — r|| < 2e for 
n ^ N. This shows that T„ — T, and so 7’, are continuous, and furthermore that 
T n — »■ T . □ 

Proposition 8.8 

If T : X —> Y and S : Y -> Z are operators, then so is their composition 
ST, with ||S71 < ||5||||r||. 

B(X) B(X, X) is closed under multiplication. 
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Proof That ST is linear is obvious: ST (x + y) = S(Tx + Ty) — ST x + STy and 
ST (ax) = S(XTx) = XSTx. Also, 

\\STx\\ = ||5(rx)|| < ||5||||rx|| < Iisimrii 11*11, 

and the result follows by taking the supremum for unit vectors x. □ 

Examples 8.9 

1. ||0|| = 0, ||/|| = 1; more generally, ||A/|| = |k|. 

2. ► Every matrix T : R iV -> R' w is continuous. Let a matrix T have coefficients 
Tjj , then similar reasoning as in Proposition 8.5 shows T is continuous with 

(a) the 2-norms: || T’ || ^ ■yJ'ZLij |7//| 2 , and 

|| 2" || ^ ^/max <Z I^DmaxfX \Tij\), 


(b) the oo-norms: ||7j| = max, ■ | Tjj | , 

(c) the 1-norms: ||r|| = max / \Tjj\. 

Note that, just like vectors in there are various norms applicable to matrices, 
but that in any of them ||7j| depends continuously on its coefficients: changing 
them slightly by at most e does not change T drastically, e.g. 


S — 71 ^ N max \Sjj - 7j/| R Ne. 
ij 


Proof of (a). Let x — (a,). By Cauchy’s inequality, | Tj/cij \~ < |7j-/| 2 

JR \aj | 2 for each i, so 


\Tx\\ z = 


-z 


Z Tij a j 


< 


ZI 


The second inequality, known as Schur’s test and sometimes an improvement 
on the first inequality, states that ||r|| is at most the geometric mean ■fcr of its 
“largest” column and row. Again by Cauchy’s inequality, 


Vi, X T U a J < Z s/W\s/W\\ai I < /Z I T ‘J I /Z I T ‘J 1 1 a i I 2 ’ 
j j V J V J 

ll 7 *!! 2 = z I Z 7 ^ I ^ r Z l^yll^'l 2 < rc \\ x f- 


> j 


3. The norm of the operator y T is Hyll^oo when considered as a map f 1 — > F. 
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Proof Taking x = (a n ), y — (b„). 


|j • *1 < ^ \b n \\a n \ < (sup|fr,|)»| 

, n 


II y\\g<> 


gives ||y T || ^ IIjII^oo. Since the supremum ||y|| f oo is a boundary point of the set 
{ \b n \ : n = 0, 1, . . . }, there is a sequence \b nj | -» Hyl^oo, so that ||y T || ^ lljll^oo, 

IIj t || ^ \y-e m \ = \b„i \ -> IlylUoo (\\e nj || = 1). 

4. ► Any linear continuous operator on normed spaces, T : X — > Y, is Lipschitz, 
hence uniformly continuous. For example, it maps the ball B r (x) into the ball 
B\\ T \\r(Tx) (Exercise 4.17(3)). By Theorem 4.13, it can be extended uniquely to 
an operator on their (Banach) completion spaces, T : X -> Y. This extension 
remains linear and continuous, and retains the same norm, ||r|| = ||r||. 

Proof For any vector x e X, there exist vectors x n e X such that x n -> x; let 
Tx lim„^oo Tx n . Then, for any other vector y e Y , with y n — » y, y n <= Y, 

T (Xx + y) = lim T (Xx n + y n ) = lim XTx n + Ty, , — XT (x) + T (y) 

n^-o o n— >oo 

||r.r|| = lim ||7x„|| ^ ||m lim ||x„|| = ||mil*ll- 

>oo n— >oo 


So j| 7’ || ^ || r || , but, as the domain of T includes that of T, equality holds. 

5. ||m < ||5|| A l|Tx|| < || Sx || , for example, T = I,S=(^ q), jc = 

6. Let f e X* and y e Y; then the map y<p : x (4>x)y is continuous and linear, 
with II V0|| = ||y||||0||. 

Proof 

\\y<P\\ = sup ||y<Ax|| = ( sup |0x|)||y|| = ||</>||||y||. 

IUII=i 14 11= i 


7. Suppose we wish to find the solution of Tx = y (T e B{ X. Y)), but it is time- 
consuming or impossible to calculate T~ l . If S e B(X . Y ) is easily inverted and 
close to T, i.e., T = S + R and ||/?|| < ||S _1 || 1 , then || 1 /? || < 1, and the 

iteration 

x „+ 1 := X n + S~ l (y - Tx„) = S -1 (y - Rx n ) 


converges to the solution of the equation by the Banach fixed point theorem. 

Exercises 8.10 

1. Show that the following are continuous functionals, 

(a) fx := i \a>i on f 2 ; 
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(b) (px Xnlo e ' na>a n, and cp x sin(no>)a„ on £ x (co is a fixed real 

number); 

(c) Six ai on i 1 , £ 2 , £°°. 

2. If (e n ) is a Schauder basis, with x = a n e„ for each x , show that the map 
x i — > on is linear. (That it is also continuous is true in a Banach space, but not 
obviously.) 

3. ► The right-shift operator is defined by R(a„) := (0, ciq, «i It is an oper- 
ator that satisfies \\Rx\\ — ||x|| both as £°° -> £°° and £ l -> f 1 ; it is 1-1 and 
its image is closed. Note that LR = I fz RL. Show that it is also continuous as 
R : £ l -> £°°. 

4. The mapping T : l x — > f 1 , defined by T(a„) ( ao , a i/2, a 2 /3 , . . .), is linear 

and continuous. It is 1-1, and its image, denoted £ \ := im T C (’ 1 , is not closed 
in f} . (Hint: consider (1, 1/2, . . . , 1/n, 0, 0, 0, . . .).) 

5. The mapping D : £ J — > £ l , defined by D(a n ) := ( na n ), is linear and invertible, 
but not continuous. (Hint: D{e n /n ) = e„.) 

6. Other examples of operators (on l 1 or £°°) are 

S(a n ) := (aj, <7 0 , a 3 , a 2 , ■ . •), T(a n ) := (a„ +4 - a n ). 

7. Conjugation in C, z i->- z, is continuous but not linear. It is conjugate-linear, 
because Xz = Xz ^ Xz in general. 

8. r|[A]] C QT AJ for a continuous linear operator T . 

9. If a linear map is continuous at one point, say 0, then it is continuous everywhere. 

10. When T : X -> Y is 1-1 and linear, then the map x i-> || 7'x|| is a norm on X. 

11. When im T and/or ker T are finite-dimensional, their dimensions are called the 
rank and nullity of T : X -> Y . For matrices, 

(a) ranklST) ^ min(rank(S), rank(T)), rank(S + T) ^ rank(S) + rank(T), 

(b) rank(T)+ nullityCT) = dimX, 

(c) Sylvester’s inequality : nullity! S T) ^ nullity (S) + nullity (T). 

12. Typical examples of functionals acting on functions are of the form 
/ j k(x)f(x) dx, where k has to satisfy certain properties for the func- 
tional to be continuous. For example, </>/ := L e~ x f (x) dx is a functional on 
L°°[0, oo[. 

13. The integral operator Tf(y) := x~^ y+v> f(x) dx is continuous as 

L°°[l,oo[-> L°°[l,oo[, satisfying \\Tf \\ L co ^ ||/|| L oo. 

14. Some examples of continuous linear maps on C(R) are: 


(a) Tf(x) := (f(x) + /(-x))/2, 
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(b) Translations T a f{x) := f (x — a ); they are isometries and form a group 
with T a T h = T a+b , I = 7o, 7“' = 7_ fl , 

(c) Multipliers M g f(x ) := g(x)f(x), where g e C(R). 

What are their kernels and image subspaces? 

15. Find, where possible, the norms of the above mentioned operators. For example, 
1 1 5 Y0 1 1 = 1 on C(X), and the Volterra operator on L°°[0, 1] has norm 1. 

16. It is not so easy to calculate || T || in general, even when T is a matrix. Show that, 

with the Euclidean norms, || ® ^ || = max(|k|, |/u|) and || || = 1 = 

| ^ i q ^ | . If you feel up to it, show that for real 2x2 matrices, 


\\ ( a b\» _ I a 2 + b 2 + c 2 + d 2 + J {{a - d) 2 + {b + c) 2 )((a + d) 2 + (b - c) 2 ) 

11 \c d) 11 “V 2 

(Hint: Use Lagrange multipliers to find the maximum of (ax+by) 2 + ( cx +dy ) 2 
subject to x 2 + y 2 — 1. See also Exercise 15.20(7).) 

17. An integral operator T : L 1 [0, 1] -> L°°[0, 1], with kernal k e L°°[0, l] 2 , 
has ||7|| ^ H&ll^oo. So if T n have kernels k n with k n — >■ k in L°°[0, l] 2 , then 
T n T. 

18. If T„x n — > 0 for any choice of unit vectors x n , then T n — > 0. 

19. If S, T e B(X) commute, ST = TS, then S preserves ker T and im T . 

20. An ‘affine’ map f(x ) := a + Tx with T 6 5(1) is a contraction mapping when 
|| T 1 1 < 1. The iteration x„+i := a + Tx n , starting from any .ro, converges to its 
fixed point y = a + Ty (Theorem 4.16). 

21. Let ,4 a' = b be a matrix equation, where A is a square matrix. Use Example 8.9(7) 
above to describe iterative algorithms for finding the solution of the equation in 
the following cases: 

(a) ( Jacobi ) A is almost diagonal in the sense that A — D + R, with D being 
the diagonal of A, and || R || < ||Z) -1 || *. 

(b) (Gauss-Seidel) A is almost a lower triangular matrix, in the sense that A — 
L + U where L is lower triangular and ||t/|| < ||7. _1 || *• The inverse of a 
triangular matrix is fairly easy to compute. 

22. Perturbation Theory. When the solution of an invertible linear equation Sx o = y 

is known, one can also find the solutions of ‘nearby’ equations ( S + eE)x — y, 
where eE is a ‘perturbation’. Writing E = —ST, the new solution satisfies 
(I—eT)x — .ro. We might try an expansion of the type, r = xo+exi+e 2 X 2 + - ■ ■ ; 
show thatx,!+i = Tx„, and the series converges if ||£j| < 1 and e < 1. 
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Isomorphisms 

We sometimes need to show that two normed spaces are essentially the same, meaning 
that any process involving addition, scalar multiplication, or convergence, in one 
space is mirrored in precise fashion in the other space, and vice-versa. This is the 
idea of an isomorphism. 

Definition 8.11 


An isomorphism between normed vector spaces is a bijective map T : X -> Y 
such that both T and T~ l are linear and continuous. The spaces are then said 
to be isomorphic to each other, X = Y . 

An isometric isomorphism is an isomorphism that preserves distance, 
|| The || y = ||x||x forall x e X, and isometrically isomorphic spaces are denoted 
by X = Y. 

We say that X is embedded in Y , denoted X C y when X = Z C Y for 
some subspace Z, and the isomorphism X Z is called an embedding. 


Thus, isomorphic normed spaces are isomorphic as vector spaces and homeomor- 
phic (in fact equivalent) as metric spaces. Intuitively speaking, if X is embedded in 
Y , one can treat it as if it were a subspace of Y even if its elements are not in Y . 

Isomorphisms are also important in practical applications of functional analysis, 
where linear equations of the type Tx = y, with y given, are very common. Three 
requirements are prescribed for such an equation to be well-posed : (i) a solution 
exists, (ii) the solution is unique, and (iii) the solution is stable, i.e., small variations 
in y do not lead to sudden large changes in x, in other words, x depends continuously 
on y. In operator terminology, this means that T is (i) onto, (ii) 1-1, and (iii) T 1 is 
continuous. 

Proposition 8.12 

If T : X -> Y is a bijective linear map, then T~ l is linear, and is continuous 

whenc||x||x ^ ||7\r||y for some c > 0. 

When T is an isomorphism, ||r _1 || ^ ||r|| _1 . 


Proof Let T be a bijective linear map, let x, y e X, and let u T~ l x, v T~ l y\ 
then T(u + v) — Tu + Tv = x + y, so that u + v = T~ l (x + y). Similarly 
T(Xu) — XTu = Xx gives T~ l (Xx) = Xu — XT~ l x. This shows T -1 is linear. 

The inverse is continuous when ||7’ _1 y|| ^ c||y|| for all y e F, in particular for 
y = Tx: ||x|| ^ c||rx|| for all x e X. Since T is onto, the two inequalities are 
logically equivalent. 

By the previous proposition, 1 = ||/|| = || 7^ T’ - 1 1| ^ ||7’||||r -1 ||. □ 
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Examples 8.13 

1 . ► Suppose a vector space X is normed in two ways, giving two normed spaces 
X||.|| and X||.|. The two norms are equivalent if, and only if, the identity map 
I : X||.|| -> JV in . in is an isomorphism (Example 7.3(9)); equivalently, there are 
constants c,d > 0, 

Vx, c||x|| ^ |||x||| ^ d ||x|| . 

For example, with the 1-norm is equivalent to ]R W with the oo-norm. 

2. £ l is not isomorphic to £°°. It is not enough to exhibit a sequence, such as 
(1, 1, . . .), which belongs to £°° but not to i , because such a sequence may, 
in principle, correspond to some other sequence in l 1 . One must demonstrate a 
property that i 1 satisfies but £°° doesn’t; e.g. we will show later on that the former, 
but not the latter, is separable. 

3. ► The inequality c||x|| ^ || 7’xj (c > 0), valid for all x in a Banach space X, 
implies that im T is closed and T is 1 — 1. 

Proof If Tx — Ty, then c||x — y|| ^ \\T x — Ty\\ — 0 and x = y. Suppose 
Tx n — > y in T; then c||x„ — x m \\ ^ || Tx n — Tx m \\ — > 0 as n , m —>■ oo, so (x„) 
is Cauchy and converges to, say, x e X. By continuity of T , Tx n — > Tx = y, 
hence y e im T and im T is closed. 

Exercises 8.14 

1. (a) The map (“*) i— >- (0, a\, ai, 0, 0, . . .) embeds R 2 in the real space i 1 . 

(b) The map J : ( a n ) i-> (a,,/ 2"), £°° —> f} , is 1 — 1, linear, and continuous, 
but is not an embedding (Hxll^oo ^ c||/jc||^i). 

2. An infinite-dimensional space may be properly embedded in itself: for example, 
the right-shift operator R : £°° — > im R C £°° is an embedding. This cannot 
happen in finite dimensions. 

3. Separate each sequence x = (a„) into two parts x e («o- 02 , • • ■) and x 0 '■= 
(«i, <73, . . .). Then the map x i-^ ( x e , x 0 ) is an isometric isomorphism t l = 
£ l x£ l . 

4. The space £ l (Z) consists of 'sequences’ . . . , a- 2 , a-i, ao , a\, 02 , ■ ■ ■ such that 
Z~_oc \ a n I < oo- It contains f 1 as a proper subspace, even if f 1 = £ 1 (Z). 

5. Consider a well-posed linear equation Tx = y. An error Sy in y gives a corre- 
sponding fluctuation Sx in the solution x, T(x + Sx) = y + Sy. Show that 
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The number ||7’ _1 ||||7’|| is called the condition number of T. If it is relatively 
large, then the equation is said to be ill-conditioned because the relative error of 
the solution could be larger than that of the data. 

6. * Let T : l°° — »■ l°° be an operator with matrix coefficients Tij, i.e., it maps a 
sequence (a ; ), 6 pj e l°° to (X/lo Tij a j)ieN e ■ Suppose also that the matrix 
is dominated by its diagonal, meaning that for some c > 0, 

j+i 


Then ||Tjt:|| ^ c||x||. (Hint: use \a + b\ ^ \a\ — |h|.) 

7. * If Zi and X 2 are isomorphic then so are their completions X \ = X 2 . 

8. * If Xi = X 2 and = Y 2 then B(X i, Y\) = B(X 2 , Y 2 ). 


Projections 

Our next aim is to show firstly that all 77 -dimensional spaces are isomorphic to each 
other (for each 77), and secondly to seek an analogue of the first isomorphism theorem 
of vector spaces, namely V / ker T = im /’. Accordingly we need to introduce an 
important type of operator called a projection, and then construct quotient spaces. 

Definition 8.15 


A projection is a continuous linear map P : X X such that P 2 = P. 

For example, shadows are the projection of objects in R 3 
to shapes in a two-dimensional plane; a flat object on the 
ground is its own shadow. 

Playing around with the definition gives a number of conse- 
quences: 

Examples 8.16 

1. (/ — P) 2 = I — 2P + P 2 = I — P is also a projection. 

2. (I — P)P = 0, so x e im P <£> x — Px = 0, and im P — ker(7 — P) is a closed 
subspace. Similarly im(7 — P) = ker(7 — 7 + P) — ker P . 

3. Any x e X can be written as x = Px + (7 — P)x e im P + ker P. If x e 
im P fl ker P = ker(7 — P) fl ker P , then x = Px + (7 — P)x = 0, so that 
X = im P © ker P. 
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4. Any linear map on a Banach space, which satisfies P 2 = P, is automatically 
continuous when im P and ker P are closed subspaces, but more powerful results 
are needed to show this (Proposition 11.5). 

Exercises 8.17 

1. Show that the following are projections: 

(a) ra and j GO ; they have the same image, but different kernels, and 
their norms are y/2 and 1 respectively. 

(b) P ^ ^ ^ and Q ^ ^ ^ V ker P — im Q, so P Q — 0 is a projection 
but QP is not. 

(c) RL , where R and L are the shift-operators. 

(d) x<j) e B(X), where <j> e X* and x e X such that <px — 1; in this case, 
X — [xj ® ker </>. 

2. If P and Q are commutative projections, then P Q projects onto im P fl im Q, 
and P + Q — P Q projects onto im P + im Q. 

3. By induction, if I — P\ + ■■■ + P n , with the projections P, satisfying P, P- ; = 0 
for i j, then X — im P\ © • • • © im P n . 

4. ** Given a closed linear subspace, is there always a projection that maps onto it? 


8.2 Quotient Spaces 

A linear subspace M of a vector space can be translated to form cosets x + M. For 
example, a straight line Let 2 passing through the origin, gives the parallel copies 
x + L. Except that with some translations, the resulting line is indistinguishable from 
L\ it is easy to see that x + L = L <£> x e L. More generally, x + L = y + L <£> 
x — y e L. This latter is an equivalence relation (check!), so the space R 2 ‘foliates’ 
into a stack of parallel lines, each a coset x + L. It is obvious that when a line L 
is translated by x, and then by y, the result is the line (x + y) + L- in fact, since 
translation in the direction of a e L is irrelevant to the coset, one can even talk about 
the addition of lines, (jc + L) + (y + L) as meaning x + (y + L ). Similarly lines can 
be stretched, \(x + L) — Xx + L (unless X — 0), and the distance between lines is 
defined in elementary geometry as the minimum distance between them. This space 
of parallel lines is a good candidate for a normed space. 

Turning to the general case, a vector space partitions into the cosets of M to form 
a vector space X/M, which is normed when M is closed, and complete when X is 
complete: 
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Proposition 8.18 

If X is a normed space and M is a closed linear subspace, then the space 
of cosets 

X/M: = {x + M : x e X} 

is a normed space with addition, scalar multiplication, and norm defined 
by 

(x + M) + (y + M) := (x + y ) + M, 

A (x "f- M ) := Xx + M , 

\\x + M || = d(x , M ) := inf ||x — a||. 

veM 

If M is complete, then X/M is complete <£> X is complete. 


Proof That the relation x — y e M is an equivalence relation with equivalence classes 
x + M, and that the defined addition and scalar multiplication of these classes satisfy 
the axioms of a vector space should be clear; the zero coset is M and the negative of 
x + M is —x + M. Let us show that we do indeed get a norm: 

|| (x + M) + (y + M) || = ||x + v + M\\ — inf ||x + v — w|| 

weM 

— inf \\x + y — u — v\\ 

u,veM 

< inf (||x - m || + || y - u||) 

u.veM 

= inf ||x — m || + inf ||v — u|| 

ueM vsM 

— \\x + M || + || v + M || 

||A(x + M) || = || A.x + M || = inf ||Ax — i>|| 

us M 

— inf || Ax — Xu || (for A ^ 0) 

«sM 

= inf | A 1 1 1 x — u || 

usM 

= | A 1 1| x + M\\ 

||x + M || = inf ||x — v|| ^ 0. 
us M 

||x + M\\ — 0 d(x, M) — 0 +> x e M = M <+ x + M = 0 
+ M (Exercise 2.20(9)). 
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Completeness. Let x„ + M be an absolutely convergent series in X/M, i.e., 
\\x n + M || converges. Now, for each n, there is av n eM such that 

\\x n — v n \\ ^ \\x n + M\\ + 1/2”. 

The left-hand side can be summed by comparison with the right, so (x„ — v„ ) 
converges to some x, since X is complete (Proposition 7.21). Thus 

N N N 

I X n + M) - (x + M) | = II X n — X + M || < I »„ - v n ) - x\\ -> 0 

n= 1 n= 1 72 = 1 

since in general ||a + M\\ ^ ||a + n|| for any v e M. Hence ■'£ n (x n +M) converges, 
along with every other absolutely summable series, and X/M is complete. 
Conversely, let (x„ ) be a Cauchy sequence in X\ then 

II (xn T M) (x m -f- M) || = I x ii x fl i T* M|| C II x n x m || 

implies that (x„ + M) is Cauchy in X/M , so converges to, say, x + M. This means 
there are v n e M such that x„ — (x + v n ) — > 0; but then, 

II Li Lh II ^ 1 1 X/7 Xm Vn T || T \\Xn X fn || > 0 

shows (v„) is Cauchy in M and converges to, say, v e M. Thus x n -> x + v. □ 

If M is a linear subspace of X such that X/M is finite dimensional, then its 
codimension is defined by codim M: — dim (X/M). 

Examples 8.19 

1. The cosets of the closed subspace M !(})]] C R 2 are the lines parallel to M, 
and R 2 /M = R. xs 

Proof A vector x belongs to xo + M when x — Qj) + 1 (]) for some t e R, which 
is the equation of a line parallel to ( | ) . The map a i-> (q) + M, R -» R 2 /M is 
linear and continuous. It is bijective since + M = ( a / ) b ) + M and 

(o)“(o) eM ^ ( fll o fl2 ) = A (0 ^ 

The inverse map is continuous as the distance ||(q) + M\\ equals \a\/^/2. 

2. If X is finite-dimensional, then so is X/M, with 


codim M — dim X/M — dim X — dim M. 
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Proof Let e\, .... e m be a basis for M, extended by e m+ \ , ... ,e„ to a basis for 
X. Then, for any vector x — ]T"_j >-,■ c-,- , its coset, 

n n 

X -f- XI = A ; (' j -f- M — A ( Cj -f- iff), 

i= I i=m + 1 

is generated by e m +i + M , . . . , e n + M. Moreover, these are linearly independent, 
since 


Xj6j T- XI — 0 XI 4b A; e j — of j c j j £ M 

i=m+\ i=m + 1 i=l 

4b A.,- = 0, i = m + 1, . . . , n. 

Hence dim X /XI = n — m. 

3. If (p C X* then ||x + ker <p\\ 

Proof For <px f 0, then X - 


_ Ml! 

~~ 11011 ' 

= 1-xrJ ® ker/, and 


MJ . M \4>y\ \x\\<j>x\ \fx\ 

|| 0|| = sup = SUp = 

vb=0 llbll asker0 ll^-L +fl|| mf || X + a 

ae ker 0 


The following proposition states, in effect, that when one translates a closed linear 
subspace to any distance c < 1 from the origin, the resulting coset intersects the unit 
sphere: 

Proposition 8.20 Riesz’s lemma 


For any non-trivial closed linear subspace M, and 0 / c: < 1, there is a 
unit vector x such that ||;t + M || — c. 


Proof Let y f M so that || y + XI || > 0; by re-scaling y if necessary, one can assume 
|| v + M|| = c. The map /: M —> R, defined by f(q) := ||y + a ||, takes values close 
to c, as well as arbitrarily large values (|| y + Xa || ^ |A|||a|| — ||y|| — > oo as A -> oo, 
forM f 0). Since Mis connected, and /is continuous, its image must include ]c, oo[ 
by the intermediate value theorem (Proposition 5.6). In particular there is an a e XI 
such that || v + a|| = 1, soletting.r := y + a gives ||x + M|| = ||y + M|| — c. □ 

Exercises 8.21 

1. ► The mapping x x + M, X — > X/ M, is linear and continuous. 

2. Let XI := {/ e C[0, 1] : /( 0) = 0}, then 2 + M = { f e C[0, 1] : /( 0) = 2}, 
and C[0, 1]/M = C. 
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3. (a) X/X = 0, X/Q = X. 

XxY 

(b) If X, Y are normed spaces, then = Y . 

V X x 0 


4. Let X be a finite-dimensional space generated by a set of unit vectors E := 

{c; : i = 1 n }, and let M; := [[£ \ { c,- }3- Then the coefficient \ai\ in 

x — X/=t a i e i i s at most 1 1 x 1 1 / 1 1 e,- + Mj || . Thus, in finding a basis for X, it is 
best to select unit vectors that are as ‘far’ from each other as possible. 


5. Let M be a closed subspace of X . If both M and X/M are separable, then so is X. 


8.3 R iV and Totally Bounded Sets 


That finite-dimensional normed spaces ought to be better behaved than infinite- 
dimensional ones is to be expected. What is slightly surprising is the following result 
that they allow only a unique way of defining convergence: Any norm on C ,v is 
equivalent to the complete Euclidean norm. This is an example of a mathematical 
“small is beautiful” principle, in the same league of results as “finite integral domains 
are fields”. 

Theorem 8.22 

Every A -dimensional normed space over C is isomorphic to C N , and so is 
complete. 

The theorem is also true for real finite-dimensional normed spaces: they are iso- 
morphic to . 

Proof Let X be an /V-dimensional normed space, with a basis of unit vectors 
ci, , cat, and let C iV be given the complete 1-norm (Example 7.16(3)). There 
is a map between them, J : C N -* X, defined by 



aqci + • • • + ( xn ^ n - 


Linearity of J follows from the distributive laws of vectors; that it is 1-1 and onto 
follow from the linear independence and spanning of { e n } respectively. 

J is continuous since 


/xllx = Hoqci H baivcivllx 

< lail 4 b |anvl 

= Mi 
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To show J 1 is continuous, let f(x) ||/jc||x. which is a composition of two 
continuous functions: the norm and J . The unit sphere S {u e C N : || m || i = 1} 
is a compact set (since it is closed and bounded in C N — Mr N (Corollary 6.20)), so 
fS is also compact (thus closed in R). One point that is outside fS is 0, 

f(x ) = 0 O ||/jc|| = 0 Jx = 0 <£> x = 0. 

Zero is therefore an exterior point contained in an open interval ]— c, c[ outside fS. 
This means that c R 1 1 J u 1 1 for any unit vector u . Applying this to u — x/j|x| | for any 
(non-zero) vector e <C N , we find c||jc||i R ||/jc|| as required (Proposition 8.12). 

Clearly, the proof does not depend critically on the use of complex rather than 
real scalars. □ 

Proposition 8.23 Riesz’s theorem 


A subset A of a normed space X is totally bounded <s> K is bounded and 
lies arbitrarily close to finite-dimensional subspaces, meaning 

Ve > 0, 3Y A-dim subspace of X, Vx e X, ||x + T|| < e. 

Balls are totally bounded only in finite-dimensional normed spaces. 


Proof (i) Let K C [J | B f (x ; ) be a totally bounded set in the normed space X, 
and let Y := flAi, . . . , a',v]]. Any point x e K is covered by some ball B € (xi), i.e., 
||x — x,-|| < e, so that || jc + T|| = inf y€ y ||x — y|| < e. Since e can be chosen 
arbitrarily small, this proves one implication in the first statement. 

In a finite-dimensional normed space, bounded sets are totally bounded: This is 
true for <C N because balls (and their subsets) are totally bounded (Exercise 6.9(2)). 
Any finite-dimensional space Y has an isomorphism J : C /V -> Y by the previous 
theorem. If A is a bounded subset of Y, J~ l A is a bounded set in C ,v (Exercise 
4.17(3)), hence totally bounded; mapping back to Y, A — J J~ l A is totally bounded 
(Proposition 6.7). 

For the converse of the proposition, suppose K is bounded by r, and lies within c 
of an /V-dimensional subspace Y. This means that if a e K then ||x|| R r , and there 
is a y e Y such that ||x — y|| < e, so 

||y|| R || Jc || + || V - x\\ < r + e. 

But we have just seen that the ball B r + e (0) fl Y is totally bounded in Y, and can be 
covered by a finite number of e-balls, B € (yi), i — 1, .... n. In particular, there is 
some y, for which ||y — y, || < e, and so 
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x - v; || ^ ||x - y || + || y - y,-|| < 2e, 

=> K c ULt^Cy/)- 


X 


Vi y 


Y 


(ii) Suppose X has a totally bounded ball, which by re-scaling and translation can 
be taken to be the unit ball Bx (Proposition 7.5). It must be within e < 2 °f a 
finite-dimensional closed subspace Y. In fact X — Y, otherwise we can use Riesz’s 
lemma to find a vector y e Bx with d(y, Y) = ||y + F|| ^ j > e. □ 

Examples 8.24 

1 . All norms on C' v are equivalent. 

2. Given a point x e X and a finite-dimensional subspace M, there is always 
a best approximation a e M to x. We need only look in the compact ball 
B R||.v||[0] fl M, and since the function a i->- ||a — x|| on it is continuous, 
it achieves the minimum (Corollary 6.16). 

For example, there is always a polynomial of degree at most n that best approxi- 
mates a function with respect to any given norm. 

3. If M is a complete subspace of a normed space, and N a finite-dimensional 
subspace, then M + N is complete (see Example 7.1 1(2)). 

Proof It is enough to show that M + |[e] is complete when e f M : the result then 
follows by induction. For any x e M, a e C, 

|a|||e + M|| = ||ae + M|| ^ ||ae + x||, 

||x|| ^ ||x + Q!e|| + |Q!|||e|| ^ c||ae + x||. 

So if (x„ + a n e) is a Cauchy sequence in M + |[ej, then so are ( a n ) and (x„), in 
C and M respectively. Hence, x„ + a n e — > x + ae e M + dej. 

Exercises 8.25 

1. Totally bounded sets cannot be open (or have a proper interior) in an infinite 
dimensional normed space. 

2. The set of polynomials of degree at most n forms a closed linear subspace of 
L l [a, b] with dimension n + 1; a basis for this space is 1. x, . . . , x". 

3 . As an illustration of Riesz’s theorem, the unit ball in the infinite-dimensional space 
i°° (or f 1 ) is not totally bounded. (Hint: Show (e n ) has no Cauchy subsequence.) 

4. In finite-dimensional normed spaces only, the compact sets are the closed and 
bounded ones. 

5. Totally bounded sets need not lie in a finite-dimensional subspace, just arbitrarily 
close to them. Can you think of an infinite-dimensional totally bounded set? 
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6. * A space generated by an infinite but countable number of linearly independent 
vectors, [ei, e 2 , . . . ]], cannot be complete: the linear subspaces UeiJ, [ei, • • • 
are closed in it, and do not have an interior (why?), so by Baire’s theorem their 
union cannot be a complete metric space. 

Remarks 8.26 

1 . Continuous operators are widely referred to as being “bounded”; except for the 
zero operator, their image is certainly not bounded! The reason they are called so 
is that, being Lipschitz maps, they send bounded sets to bounded sets. The usage 
of “bounded” is avoided in this text, in favor of the equivalent term “continuous”. 

2. By analogy with matrices, it is customary to write T x instead of T (.r). This is a 
slight abuse of notation; a linear map on the vector space of matrices need not act 
on the left, e.g. A i-^ AB, A i-> AB + BA, A i-> A T , and A B~ l AB are all 
linear. 

3 . For the initiated, the idea of continuous linear maps can be extended to continuous 
multi-linear maps (tensors); they also form a Banach space with norm 

II := sup \T{x \, . . ., 0 1 , . . .)|/||xi || . . . ||0i|| .... 

4. B(X, Y) forms part of the larger space of Lipschitz functions X — * Y. For such 
functions, ||/|| := sup^^g* ||/(xi) - /(x 2 )||/||xi - x 2 || satisfies the norm 
axioms, except that ||/|| = 0 <£> / is constant. 



Chapter 9 

Main Examples 


Having fleshed out a substantial amount of abstract theory, we turn to the concrete 
examples of normed spaces and identify which are complete and separable. Unavoid- 
ably, the proofs become more technical once we leave the familiarity of finite dimen- 
sions and enter the realm of infinite-dimensional spaces, having to deal as it were 
with sequences of sequences or functions and different types of norms. However, a 
careful study of this section will be rewarded by having an armory of spaces, so to 
speak, ready to serve as examples to confirm or refute conjectured statements. 


9.1 Sequence Spaces 


The Space l°° 

A sequence in i°° is a sequence of sequences, x„ = (a n j). Convergence in l°° means 
uniform convergence of the components, that is, 

x n 0 sup,- | a n i\ —> 0 asn —> oo 

O «/i/ Oasn — > oo, uniformly for all components i, 

Ve > 0, 3N, V/7 > /V, Vi, \a n i\ < e. 


For example, of the following three sequences of sequences, only the first converges 
to 0 , even though each component converges to 0. 


(1, 1, 

1,1,...) 

(1,0, 0,0, 

(I I 
V 2 > 2 

’ 2’ 2’ ' ' '> 

(0, 1,0,0, 

(1 i 

v 3’ 3 

1 1 ) 

’ 3’ 3’ • ’ 

(0,0, 1,0. 


4 


(0,0, 

0,0, ...) 

(0,0, 0,0, 


-) 

( 1 , 1 , 1 , 1 , 1 ,. 


) 

( 0 , 0 , 1 , 1 , 1 , . 


-) 

( 0 , 0 , 0 , 0 , 1 ,. 



...) ( 0 , 0 , 0 , 0 , ...) 


J. Muscat, Functional Analysis, DOI: 10. 1007/978-3-3 19-06728-5_9, 
© Springer International Publishing Switzerland 2014 
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Theorem 9.1 


l°° is complete but not separable. 


Proof (i) Let ( x n ) be a Cauchy sequence in £°°, i.e., ||jc„ — JC m ||^oo — »■ 0 as 
n, m — > oo. Note that ||jc n ||^oo Sj c since Cauchy sequences are bounded (Exam- 
ple 4.3(5)). 


Xl flu fll2 013 ^ ||jCi ||^oo 

X2 021 022 «23 • • • ^ ll*2llf«> 


4* "i - 4* "i - 

t fl[ «2 (13 ■" ^ c 


(The absolute signs of a n i are omitted in the horizontal rows.) 

For each column i, \a n j — a mi -| ^ ||x n — x m \\^oo —>■ 0, so ( a m - ) is a Cauchy 
sequence in C, which converges to, say, a; := lim„_ ! . 00 a„,-. 

That x := (a, ) is in £°° follows from taking the limit n —> oo of 


U-ni | ^ ||*n Ufoo ^ C. 

More crucially, x„ -* x in t ' 30 since, for each column / and any n e N, one can 
choose an m f n large enough that |a mi - — a; | < 1/n, so that 

1 

I O/ 0/u I ^ l^i 1 T Ami 0m' I ^ ~ T ll-Ciz 4C n ||^oo > 0, 

n 


as n — > oo, independently of i. 

(ii) To show l°° is not separable we display an uncountable number of disjoint balls 
(Exercise 4.2 1(4)). Consider the sequences that consist of Is and Os. The distance 
between any two of them is exactly 1 , so that the balls centered on them with radius 
1 /2 are disjoint. Moreover, these sequences are uncountable for the same reason that 
the real numbers are uncountable: If one were able to list them as 

*1 = (All- 012, «13, • • ■) 

*2 = (021, 022 , 023, • ■ ■) 

*3 = («31, 032, 033, • ■ ■) 


one could take the diagonal sequence (an, fl 22 , ■ ■ •), and swap its Is and Os, giving 
a sequence (1 — a nn ) that cannot be in the list, for 1 — a nn ^ a nn . □ 
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Proposition 9.2 


The space of convergent complex sequences, and of those sequences that 
converge to 0, 

c := { (a„) : 3a e C, lim a n — a }, 

n— MX) 

CO := { (a„) : lim a„ =0}, 

n-+o o 

are complete separable subspaces of l°°. 


Proof The spaces are nested in each other as co C c C l°° since convergent 
sequences are bounded. They are easily shown to be linear subspaces: a„ + £>„-» 
a + b, Xa n — > ka when a n — »■ a and b n — > b as n — > oo. 

cq is closed in t°°: Let x„ —> x in l°°, with x n e co; their components converge 
uniformly a,,; —*■ o,- as n —> oo. 

xi 

X2 

I 

x 

Now, for any e > 0, there is an x„ in co such that ||jc„ — x||^ : oc < e, and for this 
sequence, there is an integer N, such that 


mi a 12 ai3 . . . — > 0 
021 022 023 ■ • ■ — * 0 

'l' 4- 'l' 4- 

? 

ai a2 03 . . . — > 0 


i f N =>• | a,,/ 1 < e. 


It follows that for i f N, 


1 0/1 ^ |o H / 1 + | fl/ a n i | ^ | On/ 1 4“ ||-T -ttnlU 00 < 2c 

so lim/^oo a,- = 0 and jt e co- 

co is separable : The vectors e n := (<5 m ) = (0, . . . , 0, 1, 0, . . .), with the 1 occur- 
ring at the nth position, form a Schauder basis for co: for any jc = (a„) e co, 

N 

II JC — /_'fl, ! c„ l^oo = sup |fl„| — > 0, asiV — > 00 . 

„=0 

If Xn o„c„ = Xn then (oo — bo, oi — b\ , . . .) = 0 hence a n — b n and the 
coefficients are unique. 
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The spaces c and co are isomorphic: Let J : c — > co C l°° be defined by 
J(ao, a\ , a?, . . .) := (— . a, ao — a, a\ — a, . . .), where a lim a n . 

n—*o o 

/is 1-1 since 


J{a n ) — J (b n ) =>■ a = b and 'in, a n — a = b n — b =>■ (a„) = (b„). 

J is onto co for, given any y = (b n ) e cq, it is clear that x := (b\ — bo, £>2 — bo, . . .) 
maps to it. In fact, writing 1 := (1, 1, . . .), 

Jx = Rx — a 1, J~ l y = Ly — bol, 

where R and L are the shift operators. This observation shows that both J and / -1 
are continuous and linear since (a„) !-»■ a, as well as (/;„) m- bo, are functionals 


\a\ = | lim a„\ = lim \a„\ ^ sup|o„| = ||(a„)|| i00 

n—>oo n-+o o n 

l&ol < sup \b n \ = \\(b n )\\^. 

n 


It follows that c has the same properties of completeness and separability that cq 
enjoys. □ 

Theorem 9.3 



Proof Given y — (b n ) e l 1 and x — (a n ) e cq, the inequality 


OO 

\y-x\ = | ^ b n a n 

n = 0 


OO OO 

< ^\b n \\a n \ < sup|a n |^|fo„| = ||x|| € oo||y|| £ i 
n = 0 tl n = 0 


shows that the linear map y T : jc i— y • jc := (Example 8.4(4)) is 

well-defined and continuous on f°° (including co), with ||y T II < IlylUi- 

Every functional on co is of this type: By the linearity and continuity of any 
<P e co*, 


OO OO 

4> x = = y'anbn = y x. 


n = 0 


n = 0 


where := fe„, y := (/?„). 
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Also, writing b n = \b n \e ,en in polar form, 

OO OO (30 

2>"l = =cf>(Y j e~ Wn en) < 1101111(^)11^ = ||0||, 

72= 0 72=0 72=0 

hence y e i l , with ||_y||^i ^ ||0|| = ||y T ||. Combined with the above, we get 

lljll^i = IIJ T H- 

Isometric isomorphism: Let J : I 1 — > c?j be the map y i— >■ y. The above 
conclusions can be summarized as stating that J is an onto isometry. That J is linear 
is easily seen from the following statement that holds for every jc e co, it, v, y e i , 

(it + v) • X = T Vn)fln = o tln^n T o Vn&n = It • X + V • X , 

(Ay) ■ x = o(M„)a„ = A 0 b n a n = A (y • x), 

so (u + v) T = u T + v T and (Ay) T = Ay T . □ 

Exercises 9.4 

1. The kernel of the functional Lim : (a n ) i->- lim a n on c, is co- 

72— >CO 

2. Any convergent complex sequence a n —> a can be written as 

(a n ) = ^( a„ - a)e n + al, 


where 1 := (1,1,...). Deduce that the vectors e n together with 1 form a 
Schauder basis for c; what is its dual space c*? 

3. ► One can multiply bounded sequences together as ( a n )(b n ) := (a n b n ), to 
get another bounded sequence, ||xy||^oo ^ ||jtr ||^oo ||y ||^oo . This multiplication is 
commutative and associative, and has unity 1. Only those sequences which are 
bounded away from 0 (i.e., \a n \ ^ c > 0) have an inverse, namely (a,,) -1 = 
(«»')■ 

4. * The inequality Hxyll^i ^ || jc || ^oo || y || £i is also true, so the map x i-v M x , where 
M x y xy, embeds l°° in 

5. The vector space [co, e I ■ • • ■ 3 is often denoted by coo; it is the space of sequences 
with a finite number of non-zero terms. Its closure in the f°°-norm is coil = c o- 

6. co contains the space of sequences If 1 := { (a„) : Be, 'in ^ 1, \a n \ ^ c/n s } 
(.v > 0). What is its closure? Can you think of a sequence which is in co but not 
in any £J°? 

7. The distance between a sequence (a n ) e l°° and c o is lim sup„ \a n \. 

8. * C[0, 1] can be embedded in i °°, since / e C[0, 1] is determined by its values 
on the dense subset Q fl [0, 1] which can be listed as a sequence (q n ). Check that 
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the mapping / i— >• ( f(q n )) is linear and isometric. The Banach-Mazur theorem 
states that every separable Banach space is embedded in C[0, 1], 

The Space i 1 

Convergence in l l is more stringent than in £°°. This can be seen by the inequality 

OO 

Vx = (a,) e l l , || jc || ^ oo = sup|a,-| = max|a,| < } |a,-| = ||x|| £ i 

' 1 i=0 

so x n — > 0 in £°° does not guarantee x n — > 0 in £ l . For the latter to occur, not 
only must the components approximate 0 together, but their sum must also diminish. 
Fewer sequences manage to do this, and this is reflected in the fact that £ l is separable. 

Theorem 9.5 


l x is complete and separable. 


Proof (i) Since l l = Cq, one can argue that i 1 is complete, as are all dual spaces 
(Theorem 8.7). 

Alternatively, the following direct proof shows that every absolutely summable 
series in V converges (Proposition 7.21) (Note: as (} is defined in terms of sums, 
it is more straight-forward to use series instead of Cauchy sequences). Suppose 
x\ + X 2 + ■ ■ • is a series such that ||x n ||^i = s. In the following diagram, we 
will show convergence of the various vertical sums. 


XI 

flio + «n + «12 + • • • 

ll^lllfi 

+ 

+ 

+ 

+ 

+ 

X2 

fl20 + «21 + 0-22 + ' • ' 

\\X2\\p 

+ 

+ 

+ 

+ 

+ 






X 

ao 

a\ 

a 2 

s 


(Note that the absolute signs of a m are omitted in the horizontal sums.) 

The main point of the proof is that any rectangular sum of terms in this array is 
less than the corresponding sum on the right-hand column: 


J N 

i=I n—M 


J N N 

^ ^ ^ 1 \ a ni\ ^ ^ 

i=I n=M n=M 
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In particular, taking the/ th column, | ]T (| a„i\ f \a, u f .v shows that it converges 
in C to, say, a; := HTHq a nl . In fact, the whole array sum is bounded, JT \a, \ = 
Hi |Xh1o a m | ^ s, so that jr := (a,-) belongs to £ l . 

Finally, note that any rectangular sum goes to 0 as it moves downward, because 
HhLn II^hIU 1 0 as N —> oo. Hence 


N oo N oo oo 

^ 1^1 = ^ y |^i ^ | = ^ ^ | ^ ^ fini | ^ ^ 

n= 1 j= 0 n=l i=0 «=7V+ 1 


giving x = Z“ 0 *„. 

(ii) The sequences e n := (0, . . . , 0, 1, 0, . . .), with the 1 occurring at the nth position, 
is a Schauder basis because for any vector x = (a„) e i 1 , 

N 

II* - ^<3, ,«?„!! = II («o, « 1 , ■ - ■) - (ao, a N , 0, 0, . . .)|| { i 

n = 0 (} 

= II (0 , 0 , flAr + l )||<;1 

OO 

= |a„| —*■ 0 as/V-^-oo 

n=N+l 

since |a„| converges. If x = HnLt)b n Vn as we U> t ' len ^<« = c «< • x = a m for 
each m e N, so e„ form a Schauder basis. □ 

Proposition 9.6 


Every functional on £ 1 is of the type (a n ) i->- ]T (| b n a n where (b n ) e l°°, 

and 

l l * = l°°. 


Proof The proof is practically identical to the one for Cg = l l , except that now 
y = (b n ) e £°° and x = (a n ) e i 1 . The inequality 


|y -x| ^ yjfrnlKI < SU P \b n I 2^ l fl « I = IItII£oc||x|| £ i 

n n n 

shows that the linear mapping y T : i 1 — * C is well-defined and continuous with 

llj T || ^ lid'll^- 

Every functional on i 1 is of this type: Let </> e l 1 *, then by linearity and continuity 
of <p. 



146 


9 Main Examples 


oo oo 

4>X = (/)( y' a n e n } = y^,a„b n = y ■ x, where/?,, := </><?„, y := (b„). 

n = 0 n = 0 

Moreover \b„\ = \<j>e„\ < \\(p\\ \\e n || £ i = ||0|| so that y e £°°, with || < ||</>||. 

AS 0 = J> T , || jllfoo = ||y T ||. 

Isomorphism: The mapping / : £°° —> i 1 *, y i-»- y T , is linear and the above 
assertions state that J is an onto isometry. □ 

Exercises 9.7 

1. Suppose each coefficient of jc„ = (a,,,-) e i 1 converges, a„i —>■ a; as n oo, 
and let x := (a, ) e f 1 ; then it does not follow that jt„ — > x in l 1 , e.g. e„ /> 0 . 
But if |a„/ — a,' | is decreasing with n (for each i), then x n —*■ x in i . 

2. ► l 1 has a natural product, called convolution: 


n 

(a n ) * {b„) := ( aobo , aibo + ao^i. O2&0 + a\h + ao^ 2 , ■ ■ • , ^ an-ibj , . . .). 

1=0 

This is indeed in f 1 because the sum to n terms (a triangle of terms a; bj ) is less 
than (|flol + ■ ■ • + K|)(|Z?ol H + \b n \) (a square of terms), so that 

II* * tII^i < ||*||^i||j'||^i. 

Convolution is commutative and associative, and cq acts as the identity element 
eo * x = x. The inverse of (1, a, 0, . . .) is (1, —a, a 2 , —a 3 , . . .), which is in l 1 
only when |a| < 1. 

3. If x e t , but y e l°°, then x * y is a bounded sequence 


X * J>|| £ oo < 11*11^1 lljllfoo. 


4. The right-shift operator can be written as a convolution Rx — e\ * x. In general, 
R"x — e„ * x , since e n * e m = e n+m . The “running average” of a “time-series” 



N 


5. * A subset K of f 1 is totally bounded it is bounded and 


We > 0 ,3N e N,3«i, ...,n N ,Wx e K, ||*|n\{„ 1 ,...,„ w }|| £ i < e. 

(Recall that K lies arbitrarily close to finite-dimensional subspaces.) 

6. I 1 has the functional Sum(Z?„) := bn ■ It corresponds to the bounded 

sequence 1 = (1, 1, . . .), i.e., Sum x = 1 ■ x. Hence if '^ jn ■ |a m -| < oo then 
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ZZ fl " ! ' 

i n 


=zz 

n i 


&ni • 


7. The functionals Sn(ci„) := on correspond to e n e l°°, i.e., 8^x = ■ x. 

Similarly, the sum Sumjv(a H ) := X«=o a „ corresponds to Co + • ■ ■ + c,v = 

(1 1,0,...). Since (1, . . . , 1, 0, . . .) 1 in l°°, we also have Sum/y -f* 

Sum in l 1 *, yet Sum^(jc) -> Sum(jc) for any sequence x e i 1 . We’ll discuss 
this apparent paradox in a later section (Section 1 1.5). 

The Space i 2 

This normed space has properties that are, in many respects, midway between i 1 and 
i°° . Yet it stands out, as it has a dot product x ■ y defined for any two of its sequences, 
and xx — ||x|| 2 ; we will have much more to say about normed spaces with such 
dot products in the next chapter. 

Theorem 9.8 


l 2 is complete and separable. 


Proof (i) Let x n = ( a„i ) be a Cauchy sequence in £ 2 ; the terms are uniformly 
bounded ||x„|| ^ c. For each i. 



i 


X„ - Xn 


0 as n , m 


so (a n i) is a complex Cauchy sequence which converges to, say, a; := lim„_ i , 00 a n i- 
The sequence x := (a/) belongs to l 2 by taking the limit ,V — > oc of 

N 

lim 'S’' \a n i\ 2 < lim ||jc„|| 2 ^ c 2 . 

n— >oo ^— 1 n — >oo 

(=0 

As x n is Cauchy, for each e > 0 there is a positive integer M such that 
n,m > M =>■ ||x„ — x m \\ < e. 

Moreover, for each i e N, there exists an integer M; such that 


N 

ZI 

i=0 
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Therefore, for any N e N, picking m larger than M, Mq, Mi,..., Mn, gives 


N 


\| i=0 



N 



£ 

/A 

^ 1 a ni 

&mi P “I" 

\ 

i=0 






N 

< 1 


+ 

z 



\ 

1=0 


^ I a mi #il 

\l >'= o 


4 f 


which implies ||jc„ — jc|| < 3e for n ^ M. 

(ii) For separability, £ 2 has the Schauder basis e n , since for any x — (a n ) e (? , 


N 

X ^ \ fln&n II 
11=0 £2 


11 ( 0 , , 0 , a N + i , . . .)\\ t 2 


z 


-> 0. 


\] n=N + 1 


Uniqueness of the coefficients follows as in the proof of Theorem 9.5. □ 

Proposition 9.9 

Every functional on £ 2 is of the type (, a n ) i->- ]T (I b n a n where (b„) e £ 2 , 

and 

l 2 * = i 2 

‘Proof’ . The argument is so similar to the previous ones about Cq and t l * that it is 
left as an exercise (use Cauchy’s inequality at one point). 

Exercises 9.10 

1. Show that |jc • y| = |x|| ||y| if, and only if, y is a multiple of x (or x = 0). 

2. The map (aj, . . . , a n) i->- (a\, . . . , a n, 0, 0, . . .) embeds C N in l 2 . 

3. £? contains the interesting compact convex set { (a n ) : |a n | ^ 1 /n }, called the 
Hilbert cube. It is totally bounded in l 2 , as it is close within any e to a finite- 
dimensional space { (a n ) : Vn > N e ,a n =0 }, yet it is infinite-dimensional; it 
cannot enclose any ball (else the ball would be totally bounded). 

4. ► The various sequence spaces are subsets of each other as follows: 

c 0 o C £ l C t 2 C c 0 C c c £°°, because ||x||^oo < ||jc|| f 2 < ||jc||^i, 

but £ 1 C £? C co are not Banach space embeddings! Show further that c oo with 
the respective norms is dense in £ , £ 2 , and co (coo cannot be complete in any 
norm, Exercise 8.25(6)). 
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The Space l p 

The space l p := { (a„) : a n e (C, ^ \a n \ p < oo }, p ^ 1, is endowed with addition 
and scalar multiplication like the other sequence spaces, and the norm 

oo . . 

INI,, • 

72=0 


Our aim in this section is to prove the triangle inequality for this norm, otherwise 
known as Minkowski’s inequality, and show t p is complete and separable. 

As the reader is probably becoming aware, it is inequalities that are at the heart 
of most proofs about continuity, including isomorphisms. They can be thought of as 
a ‘process’ transforming numbers from one form to another, perhaps more useful, 
form, but losing some information on the way. Much like tools to be chosen with 
care, some are “sharper” than others. (See [8] for much more.) The following three 
inequalities are continually used in analysis. The first is a gem, simple yet rich: 

ci a b P ^ ffa + fib, fora, /5, a, b ^ 0, a + /3 = l. (9.1) 

This inequality states that any weighted geometric mean is less than or equal to 
the same- weighted arithmetic mean. The special case yfab ^ (a + b ) /2 has already 
been encountered previously. Writing a = e x , b = e y gives 

£ ctx+Py ^ a£ x + p e y 

This is equivalent to the convexity of the 
exponential function, and can be taken as its 
proof (any real function with a positive second 
derivative is convex). 


X ax+fiy y 

The same idea applied to the convexity of x p , p ^ 1, gives 

( aa + pb) p ^ aa p + jib p , fora, /J, a, h ^ 0, a + ft = 1. (9.2) 

A third inequality of importance is 

a p + b p < (a + b ) p , for p ^ 1, a, b > 0. (9.3) 

Its normalized form 1 + t p ^ (1 + t) p , for t = b/a ^ 0. can be obtained by 
comparing their derivatives p t p ~ 1 ^ p(] + t) l ’~ 1 , as they start from the same value 
at t — 0. 
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Proposition 9.11 


For a, b, a, p ^ 0, a + p = 1, p ^ 1, q > 0, 
min(a, b) ^ ( aa ~ q + pb~ q )~ 1/q 


harmonic mean 
geometric mean 


^ (aa 1 ^ + pb x 'P) p 



^ aa + fib 


arithmetic mean 
root-mean-“square” 


^ max (a, b). 


Proof (i) If a f b (without loss of generality), then a q f b q , so 



a + P 


a q a q 


+ P 1 


which is equivalent to the first inequality of the proposition. 

(ii) The second inequality is equivalent to a~ aq b~^ q f aa~ q + p b~ q , which is (9.1) 
with a, b replaced by a~ q , b~ q respectively. 

(iii) Similarly, the third inequality is essentially a? I p b^' p ^ aa x / p + fb l / p , which 
is (9.1) with a, b replaced by a 1 ^ , b l ^ p respectively. 

(iv) If a, b in (9.2) are substituted by a 1 / p and b l / p one obtains (ao'/p + fib 1 / p ) p < 
aa + fb. 

(v) The fifth inequality is precisely (9.2), while the sixth one follows easily if we 
assume, say, a ^ b\ for then, a p ^ b p , so aa p + pb p ^ (a + P)b p = b p . 


Substituting q/p for p in (9.3), when p ^ q, and a p for a , b p for b, yields 
(a q + b q ) l/q < ( aP + b p ) l/p forO < p^q, 
and furthermore, substituting a 1 ! q a for a and p^ q b for b in this inequality, gives 
0 aa q + pb q ) l ! q < ( aaP + pbP ) l ' p 


which is implicitly implied in the scheme of inequalities above. 


□ 


An induction proof generalizes all these inequalities to arbitrary sums or products, 


a 



(9.4) 


when a,- , or,- ^ 0, aq + • • • + a„ = 1 , p ^ 1 , as well as 



for/? < q. 
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Hermann Minkowski (1864-1907) studied under Lindemann (of 
7r-transcendentality fame) at the University of Konigsberg, to- 
gether with Hilbert. At 19 years of age, two years before he 
graduated with a thesis on quadratic forms, he had already 
won the prestigious French Academy’s Grand Prix. Starting 
1889, he developed his “geometry of numbers” ideas on lattices, 
including his inequality. After teaching in Zurich (where Ein- 
stein was a student), he moved to Gottingen, became interested 
in physics and presented his version of special relativity as a 
unified space-time. 

Fig. 9.1 Minkowski 

This last inequality remains valid for infinite sums, ||jc \\^ q ^ ||jc \\^ P when p ^ q, 
implying l p C (3 . It shows that a bounded sequence lies in a whole range of i p 
spaces, down to some infimum p. 

Proposition 9.12 Minkowski’s inequality 



x + y\\ip ^ IWIfp + II yll^p, where 1 ^ p ^ oo. 


Proof All norms in this proof are taken to be the l p -norm. Let u — (a n ) and v — ( b„ ) 
be two sequences in l p . Summing the inequality (a|a| + j$\b\) p ^ a\a\ p + f\b\ p 
(a + ft = 1, a, ft ^ 0) for a sequence of terms gives 

Y I aa„ + pb n \ p < < “X + PY\ b »\ P - 

n n n n 

or \\au + fv\\ p ^ qi||h|| p + j8||i;|| p . 

Substituting u = x/||x||, v = y/||y||,a = ||x||/(||*IH-||.y||),0 = ||y||/(M + ||y||), 
gives 


II* + y|| 
*|| + ||y 


= \\au + fv\\ < (a + /3) l/p = 1. 


□ 


Proposition 9.13 Holder’s inequality 

1 1 

I* ■ y| ^ II* Wip II y\\, p ', where - H ; = 1 , p ^ 1. 

p p' 

Proof Substitute a 1 /“ and/? 1 ^ instead of a and b in a 01 b& ^ aa+fib, witha = 1/p, 
P = 1 Ip', to get 
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aP bP 

ab ^ 1 . 

P P' 

Summing this for a sequence of terms in C leads to 

I I <r \ ’ i / i \ ' ( l fl »l P , \bn\ P \ 1 . 

\u ■ v\ < > \a n b n \ < > I 1 — I = — 1| « | 

n n\P P ) P 
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(9.5) 

1 


p + — Hull 7 ’ 
ip -i- p ,W v W tP ' 


In particular, for unit vectors u — x /\\x\\ip, v = y /\\y || fi /, we get Holder’s inequal- 
ity, 


\*-y\ 


1 1 

^ 1 7 — 1. 


IWMIyll^ p p' 


Proposition 9.14 


Forp > l,l p is a separable Banach space, with i p * = £ p ' , where ^ + ^=1. 


Proof Minkowski’s inequality is the non-trivial part in showing that l p is indeed a 
normed space. It is separable with the Schauder basis e n , since for any Jt = ( a n ) e t p , 
the series \a n \ p converges to ||jc||£ P , so 

N P oo 

||x - =||(0 0, a N+ u ...)\\ p lP = 22 \a n \ p — > 0, 

n = 0 ip n=N+l 

so x = ~Y Jn a n e n . The coefficients are unique since if x = ]T (| b n e n = (bo. b \ , . . .), 
then b n — a n . 

Dual of t p : Any vector y e l p acts on i p via x i->- y ■ x, with the latter being 
finite by Holder’s inequality |_y • x| ^ ||_y|| £p / ||Jt||^ P . By Exercise 2 below, there is an 
jc e £ p which makes this an equality. Thus ||_y T || = ||.y |L p /. 

Conversely, let f be a functional on £ p \ then fx = X«^= 0 a n b n = y • x, where 
bn := fe n , y := (b n ). Writing b n = \b n \e ,e " and noting p(p r - 1) = p', 

N N / N 

= \<Pte- Wn \bn\P'~ l )\ < 

n = 0 n = 0 \n=0 



Dividing the right-hand series gives (^^ =0 \b n \ p '^ 
y e £ p ' . 


^ || 0||; as N is arbitrary, 
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Completeness'. In common with all dual spaces, l p = l p * is complete (or from 
an argument similar to the one for l 2 ). □ 

Exercises 9.15 

1. Given 1 ^ q < p, find an example of a sequence which is in l p but not in i q . 

2. For each y e £ p \ find a sequence x e t p which makes Holder’s inequality an 
equality. 

3. Generalized Holder’s inequalities 

1 1 1 

ll-^llr < ll*Mljllc«, where - + - = 

p q r 

Z i 111 

a n bnCn\ ^ ll(an)IMI(Mllr«ll(Cn)llr> Where- H 1- - = 1. 

n P q r 

(Hint: Apply Holder’s inequality to the product \a n \ r \b n \ r .) 

4. Littlewood’s inequality : ||jc||^r ^ ||*||“p||*||]7 a ^ where \ j + ^-y-. 

(Hint: Apply the generalized Holder’s inequality above to using 

p/a and q/( 1 — a) instead of p and q.) 

5. * Young’s inequality. 


x * y\\ t r < ||x||^||y|| w , 


where ^ + | = l + p,p,^^l (Exercises 9.7(2, 3)). 

Justify the steps of the following proof. First note that y + y + \ = 1 (where 

y = l — y etc.); then using the second generalized Holder’s inequality above 
on the positive numbers a „ , b „ , c n , and an exquisite juggling of indices, (where 
k n — m ) 


N n 


N n 


Z Z a n-mb,nCn = £ 


n= 0 m = 0 


n~0 m = 0 




CE^y / 'CE<cr(z» , ~‘i) 

n,k n,k n.m 


A W, 


,A !//>' 


n,k 

N 


(s 4"&r(z^:) 


n,m 
A 1 /r' 


n=0 


n=0 


n = 0 


Hence if (c„) e l r ' , (a„) e l p , and (b„) e then (a n ) * (b n ) e ( i r ')* = t r . 

6. * Prove the reverse Minkowski inequality for 0 < p ^ 1, and positive real 
sequences x = (a n ), y = ( b n ), a n , b n > 0, 
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ll*llp + H? ||p II* + ?||p. 

(Hint: the reverse inequality has its roots in x p being concave.) 

9.2 Function Spaces 

Much of the above can be generalized from sequences to functions, where summation 
a n becomes integration f /(f) d t. For example, the proof that £°° is complete 
generalizes to the space L°°(R), practically untouched. Even though it is function 
spaces that are at the heart of “functional analysis”, we do not prove all these gen- 
eralizations here, as laying the groundwork for integration and measures would take 
us too far afield. Instead a review is provided, referring the reader to [6] for more 
details. However, we allow for vector-valued functions, because it does not incur any 
extra difficulty. To avoid confusion with the scalar || / 1| , we write | / 1 for the function 
X ||/(*)||. 

Lebesgue Measure on R N 

1. A measure /i on R' v is an assignment of positive numbers or oo to certain 
subsets E C R w with the properties that it be 

(i) additive, fi(E U F) = n(E) + for E, F, disjoint, 

(ii) continuous, => /x(E n ) — > jx{E). 

We haven’t defined a distance function on sets, but it is enough for now to 
take E n —> E to mean that E n is a decreasing sequence of sets of finite 
measure, with E n — E. 

One final property that we expect /i to satisfy, at least in R w , is 

(iii) a translated copy of a set has the same measure, // ( E + x) = /i( E). 

Examples of measures are the standard length, area, and volume of Euclidean 
geometry. 

2. Taking R as our main example, and defining //[(), 1[ := 1, these properties 
completely determine the length of any interval, namely b] — b — a = 
/i [a, /;[. (Hint: divide [0, 1[ into equal intervals to show /x[0, m/n] = m/n .) 

3. As a first step in constructing // on R, therefore, the length of any interval is 
defined to be the difference of its endpoints, e.g. m[a, b] := b — a. This function 
can be extended in two ways to 

(a) the length of any countable union of disjoint intervals 

m ( U In) := y',m(I n ), 

n n 

(b) the length of the set obtained by removing a countable union of disjoint 
subintervals from a bounded interval 
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m (I \ U In) := m(l) - y m(I n ). 
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n 


n 


4. For general sets, define 

in* (A) := inf { m(U ) : A C U = |^J /„ } 

n 

w*(A) := sup{ m(K) = 

n 

(Note that since we are taking the infimum and supremum, respectively, we might 
as well take I to be a closed and bounded interval and /„ to be open intervals, 
in which case U is an open set, and K a compact set.) 

It is a fact that there exist sets for which these two values do not agree (see [6]). 
A “well-behaved” set, called measurable, satisfies m*(E) = mJE), which is 
then called its Lebesgue measure e(E). 

5. m*((J„ A n ) ^ m*(A„) and A C B =>■ m*(A) ^ m* (B) (since open covers 

for each A n provide an open cover for their union). Of course, these statements 
continue to hold for fi applied to measurable sets. 

6. A useful equivalent criterion of measurability of E is: 

For any set A, m*(E fl A) + m*(E c fl A) = m*(A). 

7. Using this criterion, it follows that, for E, F, and E„ measurable sets, 

(a) E c ,EUF,EnF,E\F, and EAF are measurable; when they are disjoint, 
li(E U F) = //(£) + ii(F). 

(b) E n and fj/^li En are measurable, and when E n are disjoint, 

OO CO 

M((J En ' > = 

n = 1 n= 1 


The sets that can be obtained by starting with the intervals and applying these 
constructions are called Borel sets', they include the open and closed sets. 

8. Sets with (m*-)measure 0 are obviously measurable and are called null sets. For 
example, any countable set is null; but most null sets are uncountable, e.g. the 
Cantor set. The countable union of null sets is null. 

Adding (or removing) a null set N from a measurable set E does not affect its 
measure. 


H-{E U N) = gi{E) + n(N) = n{.E). 

Because measures don’t distinguish sets up to a null set, we say that two sets 
are equal almost everywhere, E = F a.e., when they differ by a null set. More 
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generally, we qualify a statement “'Pa.e.” when 'Pit) is true for all t except on 
a null set; for example, we say f = g a.e. when fit) = g(t ) for all t in their 
domain apart from a null set. 

9. The distance between measurable sets is defined as d(E, F ) := n(EAF). It is 
a metric, with the proviso that d(E, F) = 0 E = F a.e. The measure /x is 
continuous with respect to it, E n —>■ E =>■ /x ( E n ) — > /x (E). 

10. A similar procedure gives the Lebesgue measure on R^, with the modification 
that cuboids are used instead of intervals to generate the measurable sets. Most 
subsets of R iV that the reader is likely to have encountered are measurable, 
including the unit sphere and ball in R 3 . 


Measurable Functions 


1 . 


The characteristic function of a set is defined by I f it) 


1 t e E 
0 t i E 


. Linear 


combinations of characteristic functions X»=i 1 E„x n , where E n are bounded 
measurable subsets of R and x n e C, are called simple functions (or step func- 
tions). More generally, R can be replaced by a fixed measurable set A, and x n 
can belong to a Banach space X. The simple functions form a vector space S. 


2. A function / : A — »• X is said to be measurable when it is the pointwise limit 
of simple functions, s„ f a.e. For real-valued functions, this is equivalent to 
oo[ being measurable for all a e R. 

Note that simple functions supported in E (i.e., are zero outside E) can converge 
only to measurable functions supported in E (since s n Ie — * f 1 f, a.e.). 


3. Measurable functions form a vector space: Xf and f + g are measurable when 
/, g are. It follows from | |s„ | — |/| | ^ | s n — f | that |/| : A —> R is measurable. 
For real-valued measurable functions, fg, max(/, g), and sup „(/«), are also 
measurable. Real- valued continuous functions are measurable. 


4. ► In fact the space of measurable functions is in a sense complete: if /„ are 
measurable and /„ —*■ f a.e., then / is measurable. 

5. L°°iA) is defined as the space of (equivalence classes of) bounded measur- 
able functions f: A — > C, over a measurable set A, with the supremum 
norm H/H^oo := sup rae |/(f)|, i.e., the smallest real number c such that 
1/(01 < c a.e.f 

6. L°°(R) contains the closed subspace of bounded continuous functions C*(R), 
which in turn contains Co(R) := { / G C(R) : lim /(f) = 0}. The space 

r->±oo 

C[a,b] is embedded in Co(R). 

7. L°°[a , // is not separable: the uncountable number of characteristic functions 
l[ x y], a < x < y < b, are at unit distance from each other. 
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Proposition 9.16 


L°°(A ) is a Banach space. 


Proof If |/(f)l ^ ||/|| L oo except on the null set E\, and 1 47(f) | ||g||^oo except on 

the null set E2, then for all t e A \ (E 1 U E2), 

1/(0 + 5(01 1/(01 + 15(01, IV(0I = WI/(0I, 

so \\f+g\\ L oo < 11 /Hi.oo + HsIIloo, || VII too = |A.|||/|| l ». 

Clearly ||/|| £ oo = 0 only when | / (f) | = Oa.e. It follows that L°°(A ) is a normed 
space, as long as we identify ae-equal functions into equivalence classes. 

Completeness: Let /„ e L°° (A) be a Cauchy sequence, where \f,(t)\ ^ ||/„|| L oo 
for all t e A except in some null set E n . Copying the proof of the completeness of 
l°° (Theorem 9 . 1 ), 


I fn (0 ~ fm (01 + II fn ~ /milt 00 0 

for each t e A, except possibly on the null set (J (! E n , so /„ (t) is Cauchy and 
converges /„ (t) — > /(t)a.e.(f). The function / is evidently measurable, and 
f n — ► / uniformly away from this null set, since for any e > 0 and n large enough 
(but independent of t), 

l/n(0 - /(Ol < l/«(0 - /m(OI + I fm (0 ~ /(Ol 

< II /n - /mllt~ + l/»(0 - /(Ol a.e.(f) 

< 2e 

where in n is chosen, depending on t, to make |/ m (0 — /(Ol < e. This means 
that f n -> / in L°°, and implies ||/|| L oo < 11/ — /Jz°° + ll/nllt“ < 00 , so 
/ e L°°(A). □ 


Integrable Functions 

1 . Given a set E of finite measure and its characteristic function, let J 1 p := /i( E). 
For a simple function, define its integral 



N 

^ p(E n )x n - 

n = 1 
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It is well-defined, since a simple function has a unique representation in terms 
of disjoint E n . It is straightforward to verify that f (s + r) = f s + f r and 
J Xs — X f s for s, r e S. 

2. The function || s || := J |s| = /r(Zs„)||x„ || is anormon5. Here, |s| is the real- 

valued simple function |s| := Z„ l£ H ||jt n || ^ 0. In particular, for real-valued 
simple functions, r 


Proof (i) ||j + r || = Zn H(E n )\\x n + y n \\ < Z« P-(E n )(\\x„\\ + lly« II) = Ik II + 


r ||, 


(ii) ll^ll = Zn P( E „)Ux n \\ = Wlkll, 

(iii) f k| = 0 when Z„ fx(E n ) \\x n \\ = 0. This implies /z(£„) \\x n \\ = 0 for all 
n, i.e., x n =0 OR /x(E n ) — 0, so s — 0 a.e.. 

3. The integral is a continuous functional on S, || J ,sj| ^ / |s|, since, 



4. The space of real (or complex) simple functions with this norm is separable (the 
simple functions with x n e Q and E n equal to intervals with rational endpoints, 
are countable and dense), but not complete. 

5. A Cauchy sequence of simple functions converges a.e. to a measurable function. 
Proof Let s n be a Cauchy sequence in S. Given any r > 0. let 


'■= { t e R : 3 n, m f N, ||s„(f) — s,„(f)|| f e], 
€ h(E n ) = 



oo. 


This shows that for t not in the null set F € := fj /V E\r, 


3 N, Vn, m ^ A, \\s„(t) - s m (t)\\ < e. 


In particular, for r notin the null set |k„(t) — j m (t)|| — > Oas n,m — > 

oo. Thus, except for a null set, (s„(f)) is a Cauchy sequence in X and hence 
converges. 

6. A function / : M — ► X is said to be integrable when it is the ae-limit of a Cauchy 
sequence of simple functions s n -*■ f a.e. Its integral is given by the extension 
of the integral on S, 
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Note that f s„ is a Cauchy sequence in X (|| J s„ — f s m || x ^ / |j n — s m I — > 0). 
The space of (equivalence classes of) integrable functions R — > X is denoted 
by L 1 (R, X ); it is the completion of S (Theorem 4.6). By Proposition 7.17, the 
space L 1 (R, X ) is a normed vector space with 


ll/llc := lim H^ll = lim [ |s„| = [ \f\, 

n— >oo n—too J J 

so f e Z/(R, X) <£> |/| e L'(R). It also follows that for real-valued integrable 
functions / < g =4> //</#. 

7. The integral is a continuous functional on L 1 (R, X) (Example 8.9(4)), 

//+»=//+/», /v-i/z. II/zN/i/i. 

If fn -> / in L'(R, X) then / /„ / / in X. 

8. (a) / e L*(R) =>• f f(t)x dt = (f f)x, 

(b) T e B(X , T) =► jTf=T f f. 

Proof (a) is a special case of (b) with T : F — »■ X, T(X) := Xx. 

As an operator T:X—>Y acts linearly on a simple function .v = ^ )( 1 f n x n e S, 


N 

Ts = y l£„Tx„ 
»= 1 


iV 


T.s' = y'jx{E n )Tx n = T 


n= 1 


If -> / in L J (R, X) then Ts n Tf in L ! (R, T), so / Tf = T J f. 

9. For a measurable set A c R, define L'(A) := { fl A : f e L*(R) }, and let 

J \ f ./ / ’ '• 

Note that f A f — 0 for any null set A. Hence if / = 5 a.e., with g e L l (R), and 
E — F a.e., then / e L 1 (R) as well and J E f = f F g. 

10. For E, F disjoint measurable sets. 



/ 


It follows that E c F =>• / £ |/|< J F \f\- 

Theorem 9.17 


L x (A) is a separable Banach space. 
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Fig. 9.2 Lebesgue 


Henri Lebesgue (1875-1941) graduated at the Ecole Normale 
Superieure of Paris at 27 years. His thesis built upon work 
of Baire, Borel and Jordan, to generalize lengths and areas, 
and so an integration powerful enough to tackle functions too 
discontinuous for Riemann’s integration — the first complete 
space of integrable functions. After a century of attempts by 
other mathematicians, he finally proved that uniformly bounded 
series of integrable functions, such as the Fourier series, could be 
integrated term by term. Although his achievement was widely 
seen as abstract, in his words, “Reduced to general theories, 
mathematics would be a beautiful form without content. It would 
quickly die. ” 


Proof Completeness: Let /„ be a Cauchy sequence in L 1 (A), i.e., || /„ — f m || — ► 0. 
Choose s n e S close to /„, say ||s„ — f„\\ < \/n. Then (s n ) is a Cauchy sequence 
of simple functions, asymptotic to f n . By Notes 5 and 7 above, s n converges to an 
integrable function / in L l (A). Hence, so does the asymptotic sequence /„. 

Separability : By construction, the separable set S of simple functions is dense 
in L l (A): Any / e L l (A ) has a sequence of simple functions converging to it 
(. s n ) —> f a.e., so ||/ — s n || L i — > 0 as n — > oo. □ 

Much the same analysis can be made starting with the norm ||j|| p := (J |s|F) l ^ p , 
1 ^ p < oo, on S. The completion of S in this norm is denoted by L p (A), which is 
thus complete and separable ( S dense in it). 

Proposition 9.18 


If /„ -»• / in L°°(R), that is, uniformly, and 

(i) f n are continuous, then / is continuous, 

(ii) f n are integrable, then / is integrable on [a, b], and 

fn -+ I' f, 

J a 

(iii) f^ are continuous and converge uniformly, then f' n — > /'. 



Proof (i) The first assertion is a restatement of the fact that C (R) is closed in L°° (R) 
(Theorem 6.23). 

(ii) The second follows from the completeness of L 1 (a, b] and the continuity of the 
integral 
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I fn ~ f I ^ (b-Cl)\\fn - f\\ L °°[ aM 0. 


(iii) If fl -> 3 uniformly, then f' t -* g in L l [a, t] by (ii), and f a /„' £' g. But, 

assuming the fundamental theorem of calculus (Theorem 12.8), f’ f’ n = f n (t) — 

1 rt 

f n (a), which converge to f(t) — f (a) uniformly and in L [a, f]. So J a g = fit) — 
f (a), showing / is differentiable, with f = g. 

□ 


Examples 9.19 

1 . Convergence in L 1 (R) is quite different from uniform convergence. For example, 
the sequence of functions jj l[o.«| converge uniformly to 0, but not in L 1 (R), 
whereas the sequence n 1 1 0 \i n ~\ converges to 0 in Z/(R) but not uniformly. 

2. The product x ■ y of sequences becomes f ■ g := J fg for functions. Holder’s 
inequalities are valid: 

\\f9\\ L ^\\f\\Lp\\9\\ LP ', i = £ + £, 

II/IIl^ ll/IIMI/lllT*. + V- 

thus / lies in L P {A) for p in an interval of values. 

Proof Integrating \a(t)b(t)\ < and putting a = f/\\f\\ LP , b = 

g/\\g\\ LP ' , gives the first inequality. Substituting \ f\ r for /, \g\ r for g, p/ r for 
p, and q/r for p' gives the second inequality. Finally, the substitutions of |/|“ 
for /, I/I 1- " for g, and p/a for p, gives the third. 

3. ► When the domain of the functions is compact, the spaces are included in each 
other as sets, in the reverse order of the sequence spaces, 

C[a, b] C L°°[a, b] C L 2 [a, b] C L l [a, b], 

because, by Holder’s inequality, I : L°°[a, b] -* L 2 [a, b] -* L l [a, b ] is con- 
tinuous, 


ll/ll,L 1 [a,&] ^ (b — fl ) 2 WfWL^aM ^ — a ) II /II L°°[a,b} * 

4. The notation fff. f is capable of at least three interpretations, as (i) Jp f when 

/ e Z. 1 (R), (ii) lim/j. s^-oo /, (iii) lim/^oo f R f> f ■ It should be clear that 

the finiteness of these integrals follow (i) =?■ (ii) =£> (iii), but the examples 
j R R x dx = 0 and swx dx —*■ tz / 2 as R — »■ 00 show that the converses are 
false. 
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Approximation of Functions 
Proposition 9.20 

The polynomials are dense in L 1 [a , b], L 2 [a, b] and C[a, b\. 

Proof By construction, the step functions are dense in L 1 (R). Now, intuitively speak- 
ing, any real-valued step function s can be “nudged” into a continuous function g 
by replacing its discontinuities with steep slopes, and the distance ||s — g\\ L \ can be 
made as small as needed by making the slopes steeper. More precisely and more 
generally, any bounded measurable set E in R. lies between a compact set K and an 
open set U, such that /i ( U \ K) < e; also, there is a continuous function g£ taking 
values in [0, 1] such that g£K — 1, gEU c = 0 (Exercise 3.12(17)). So 

V€ > 0, 3g£ e C(K), ||g£ - 1 £ || £i = [ | ff£ — 1 £ | < m ( U \K) < €. 

Ju\k 

Consequently, taking any non-zero simple function s = ^ =] 1 E„x n and replacing 
each 1 e„ with continuous functions g n , where \\g n — 1 e„ ||£i < e/ X/^=i ll*n || , gives 
a continuous function g := 9n x n , which approximates s in L 1 , 

N 

Ik-fillL 1 < ~ 9n\\ L l\\x n \\ < €. 

n = 1 

But any function / e L 1 (R) has a simple function approximation ,v, which in turn 
can be approximated by a continuous function g. Combining these two facts gives 

11/ - Sllzd < 11/ - s\\ L i + ||j - g|| L i < 2e 

showing that the set of (integrable) continuous functions is dense in L*(K). Note 
further that precisely the same arguments work for L 2 (R). 

We have already seen, in the Stone- WeierstraB theorem (Theorem 6.24), that the 
set of polynomials p(z, z ) is dense in C[a, b\. But, in this case, z = z — x e [a, b], 
so such polynomials are of the usual form p e C[x], Combining this with the above 
result shows that C[jt] is also dense in L l [a, b] and L 2 [a, b\. for any e > 0, there is 
a polynomial p e C[.x] such that 

11/ — P\\LfaM ^ II / — 9\\l 1 [uM + II 5 — P\\L l [a,b\ < 

since ||g - p \\ L i [aM < (b - a)\\g - p\\ C [ a ,b] < e □ 

More generally, the polynomial splines are dense in the real version of these 
spaces. A spline of degree A is a function ^T (I 1 p n p n , where E n are disjoint intervals 
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and p n are polynomials of degree at most N such that the first N — I derivatives 
match at the endpoints of E n . They are often used in numerical techniques and 
graphics computing. Another useful way of approximating integrable functions uses 
the convolution: 

Proposition 9.21 Approximations of the Identity 


If h n e C(R) are such that 1i n / 0, / h n = 1 and h n ->• 0 uniformly on 

M \ [—5, 5] for any 8 > 0, then h n * f — »■ / in C(K) and L x [a, b]. 


Proof Let g be a continuous function, and let x e R; on the one hand, 



Ve > 0, 35 > 0, | _v | <8 =y | g(x + y) — g(x) \ < e, 
and on the other hand, for this S, 

3 N, n ^ AAND|y| ^ 8 =>■ 0 ^ h n (y) < e. 


(9.6) 


(9.7) 


Therefore, for all x and n / N , 


\K *g(x ) - g(x) | = 


< 


< 


j h n (y)(g(x - y) - g(x)) dy 

J h n (y)\g(x - y) - g(x)\dy 
r s r 

2e llffllc dv by (9.6) and (9.7) 


' h n (y) € dy + 
J-s 

<e(l + 2|Ml c ) 


I-Ml 


and \\h n * g — g\\ c — »■ 0 as required. 

In fact h n *f approximates / e L 1 [a , b) in the L 1 -norm, for, choosing// e C\a. b] 
close to /, ||/ - g\\ L l [a,b] < and n lar § e enough that \\h n *g-g\\ c < e 
holds, then 


II hn * f ~ fhfaM ^ II hn * 9 ~ SIIl/a,*] + ll^n * (/ ~ 9)\\L l [a,b} 
+ 11 / ~ 9\\Lfa,b] < 


since || h„ * (f - g)\\ L i \\h„\\ L i\\f - g\\ L i. 


□ 
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Corollary 9.22 


ll/(* + y) - /0)IIl1(R) 0 as y -* 0 


Proof The step functions h n := n I [-i/2«. \/2n\ clearly form an approximation of the 
identity, and so /?„*/-* / in L 1 . But their translations by T y f(x ) := f(x — y), 
namely h+ := Ti/ 2n h n = «l[0,i/«] and /?“ := T-i/ 2n hn = rcl[-i/ n ,0], form other 
approximations of the identity. Since ( T y h ) * / = h* ( T y f ), 

/ ^ (X ± _ ^ W l dv = H r ±'/2n/ ~ /II,. 

\\T±l/ n f ~ (T ±l/2 nh n ) * / || L , + H(7± 1/2b *») * / - /|| Ll 
= l|7±l/2„/ - h n * (7± 1/2n /)|| il + ||*± * / - /|| Ll 
— > Oasn — »■ oo. 


□ 


Exercises 9.23 

1. The map ( a n ) (->• Z« a n \[ n ^ n+ 1[ embeds l l into Z/fM.). 

2- If Z n II fn II z, 1 exists, then Z,“ i f fn = f ZZi /«■ 

3. The map L l (A) — > C, / i->- f gf is linear, and continuous when 5 e L°°(A). 
Assuming surjectivity, show L l (K)* = L°°(K ) for /if C R. compact, and simi- 
larly LP(X)* = L p ' (K)(p > 1). 

4. Show that the functional 8 a (f) := f(a) on C[a, b] does not correspond to any 
L 1 -function 8 in the sense of 8 a ( f) = J 8f. Hence the dual space of C[a, /;] is 
not L l [a, b] ; it consists of functionals called measures of bounded variation. 

5. Minkowski’s inequality: Emulate the proof of Proposition 9.12 to show 

\\f + 9\\LP^\\f\\LP + \\9\\LP (p> !)■ 

6. * L 1 (R) has a convolution defined on it, 

f*g(t):= J f(t-s)g(s) ds. 

Just like the same-named operation in £ 1 , it is associative and commutative; but 
it has no identity, although Gibbs and Dirac audaciously added one and called it 
8. Young’s inequality is satisfied. 


1 

7 


+ 


1 

7 


1 

?' 


\\f*g\\ L r < ll/MIsllw. 
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7. Matched Filter : An electronic filter is a circuit acting on a signal / e L 2 (W) 
and outputting the convolution g * f (g e L 1 (R.)). Signals often have white 
noise rj(t), where ||p*t 7||^2 = e||<?|| L 2 . The signal-to-noise ratio is S/N := 
111? * /ll^/llt? * »7 II 2 , 2 ’ show that S/N ^ with equality holding when 

g(x — t) = Xf(t), leR. 


The Fourier Series 

We end this chapter with a look at one of the most important operators on L 1 (0, 1], 
Back to the days of Fourier, there arose the question of whether every periodic 
function / can be built up as a Fourier series a n cos nx + b„ sin nx. This claim 
of Fourier was disputed by Lagrange and others; Dirichlet obtained a partial result 
for the case / e C 2 , and Riemann later vastly extended this result. Despite these 
protests, the use of Fourier series grew, mainly because they actually worked in many 
examples. 

Definition 9.24 


The Fourier coefficients of an integrable function / e L 1 [0. 1 ] are the 
sequence of numbers defined by 

Tf(n) = f(n ) := [' e~ 27rinx f(x) dx, n e Z. 

Jo 


This section cannot do justice to the immense number of results and applications 
of Fourier series. It must suffice here to present a couple of main results, with the 
aim of generalizing them later on. Refer to [7] for more details. 

Theorem 9.25 


T : /.‘[(I, 1] co(Z) is a 1-1 continuous operator with 

ll/il/co < I|/|| L 1 [0 ,1J 


Here, cq(Z) is defined as consisting of those ‘sequences’ suchthata,, — > 0 

as n —> ± 00 . 

Proof That T is linear is easy to show. It is continuous because 


foo = sup 
ne'L 


e - 2ninx f(x)dx 


< / |/(x)|dx= ||/|| L 1 [(U] . 
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The characteristic function l[ a ,b]i for [a, b] C [0, 1], has Fourier coefficients 


1 [a,b\ 00 


e 


—2 ninx 


Inina 


dx = 


e 


—2ninb 


2 rein 


Oas « — > ±oo. 


Hence the vector space of simple functions, as well as its closure L 1 [0, 1], are mapped 
into the complete space <?o (Exercises 8.10(8)). 

T is 7-7: If /(«) = 0 for every n, then 


f 1 e ~ 27tiny f (y) dy = 0, 
Jo 


Vn e Z. 


The aim is to show that / = 0 a.e. Firstly, 


f 1 e~ 2 ™yf{x - y) dy = [' e~ 2 ^ x -^ f(y) dy = 0. 
Jo Jo 


Secondly, since (cos^y) 2 " = ( e 2niy + e 2lT,y + 2)"/2 2 " is a linear combination 
of exponentials of various frequencies that are all multiples of 2ny, we have, for 
h n (y) := (cos7ry) 2 '7c„, 


1 f 1 

h n * fix) — — / (cos jry) 2 "/ {x - y) dy = 0, 

c n JO 

where c n := fj (cosjry) 2 " dy = 

The functions h n satisfy the criteria of Proposition 9.21, as they are positive and 
fall rapidly to 0 for | v| ^ S, as n — > oo. Thus ||/||/,i = \\h n * f — f\\ii — ► 0, and 
f — 0 a.e. □ 

The Fourier coefficients have properties that appear remarkable: when / is trans- 
lated the coefficients rotate in C, with each f{n) performing n turns as / is translated 
one whole period; differentiation of / scales the coefficients by a multiple of n\ and 
convolutions are transformed to multiplications. 

Proposition 9.26 


For periodic functions, with period 1, 

T a fin) = e~ l7nan f(n), f'{n) = 2i tin fin), f *g = f'g. 
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Proof A translation T a f(x ) := f(x — a) has the effect 
_ r 1 

/ 0 


T a f(n)= I e 2n,nx f (x — a) dx 


= f l e - 2n,n(x+a) f(x) dx = e- 27T,na f(n). 

Jo 


Differentiation, f'(x) — lim/^o f (x+h ^ gives 


fin) = lint — — )(«) = lim 


)(«) 


2ninh 


- i 


-/(«) = 2jtinf (n), 


and the convolution of / and g becomes 


f*g(n)= / e 2jlulx / f (x - y)g(y) dy dx 


= [ 1 I' e- 2 ^ x +* f(x) dxg(y)dy 

Jo Jo 

= f e~ 27Tinx f(x) dx [ e- 2 * ln yg(y)dy = f(n)g(n). 
Jo Jo 


Exercises 9.27 

1 . Show 

(a) T : 1 i-* eo = (. . . , 0 , 0 , 1 , 0 , 0 , . . .), 

(b) 1 , 1 , 

(c) T : |x - £| f* £(. . . , 0, 1, i, 1, 0, 0, 25 , . . .), 

(d) T:x{x-\){x- 1 )^ =3|(...,-i,-l,0,l,i,...,i,...). 

2. Using the open mapping theorem, show that T is not onto cq. 

3. The power spectrum of a function is a plot of \f(n)\ 2 (often with n varying 
continuously in R). It displays the dominant frequencies of /. A better plot is 
the Nyquist diagram, where f(n) is graphed in three dimensions, with one axis 
representing n, and the other two representing / = \f\e"^. 

Prove that T : C k [ 0, 1] — > c*(Z), where C A [0, 1] is the space of Utimes contin- 
uously differentiable periodic functions, and q(Z) := { {a n ) cs i : n k a„ — > 0 }. 
Therefore, how fast the power spectrum decays as n -* oo measures how smooth 
the function is. 
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4. The operator S a : f(x ) i->- a^ 2 f(ax) ( a > 0) stretches or compresses /, while 
preserving its L 2 -norm; prove S a f(n ) = S\/ a f(n). This should be familiar: 
playing a sound clip in half its normal time doubles the frequencies. 

5. Recall the analogous Fourier transform on L'(R), /(£) := f e~ 2nl ^' f(t) dr. 
Similarly to the Fourier series, 

(a) it is a continuous linear operator T : L 1 (R) — »• Co(R), 

(b) 1 [-a,a](§) = a sin(7ra§)/(7rfl£) =: a sinc(:raf ) — > 0 as £ — > ±oo, 

(c) T^f{H) = e~ 2nia ^m, 

(d) /'(£) = fcriS/te), 

(e) f * g = f'g. 

6. T -5=e _7r;l “ ! al = J~a e~ 7la ~^~ . Deduce that the convolution of two Gaussian 

•JF 

functions is another Gaussian function, 

e~ x2 l 2 ° 2 * e-* 2 ^ 2 = \j2rc — e -* 2 r-(° 2 +r 2 )' 

Notice how there is a trade-off between the ‘width’ o of the original Gaussian 
and that of its Fourier transform, namely 1 /o . 

7. Wiener- Khinchin theorem: For / e L l (R), define f*(x) := f(—x). Show 
/* = /, and the auto-correlation function /* * f(x) = f f(t)f(t + x)d t 
is transformed to the power spectrum |/(§)|. More generally, f**g is called 
the cross-correlation function of / and g. 

Remarks 9.28 

1. The functionals on l°° are more difficult to describe. Every sequence y e l x still 
acts as a functional on l°° via x i->- y ■ x, but l°°* is a complicated non-separable 
space that includes much more than just l l (look up “finitely additive measures” 
for more). 

2. We often make remarks like “the dual space of co is £ 1 ” — this is not literally true 
because a functional on co is not a sequence, but the application of one, i.e., it is 
y not y. But the two are mathematically the same object in different clothing, 
and functionals on co do behave like the sequences in l l . 

3. £°° = C/ ; (N), so the completeness part of Theorem 9.1 is included in Theo- 
rem 6.23. 

4. The Fibonacci iteration a n := a n -\ + a n - 2 , starting from ao — 1 = a\, is an 
equation on sequences. It can be written as 
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x = Rx + R 2 x + e\ + eo 
or as (eo — ei — e 2 ) * x — eo + ei 

( 1 , - 1 , — 1 , 0 ,...)** = ( 1 , 1 , 0 , ...) 

when expressed in terms of convolutions. Convoluting with the inverse of 
(1, — 1, — 1,0,...) gives the terms of the Fibonacci sequence (but note that the 
inverse is not ini 1 ). Traditionally, “generating functions” are used to get the same 
results, the connection being elucidated in Chap. 14. 

5. I 1 contains the space of sequences := { (a n ) : (; n s a n ) e l 1 }, (s ^ 0), which 
in turn contains 1+e . 

6. One can show that as p — > oo, ||jc|| fP -* || jc H^oo , if x belongs to some l q . 

7. The following are some classical criteria for determining that a sequence of mea- 
surable functions f n that converges pointwise a.e. is Cauchy in L 1 (A), 

(a) | f n | are increasing but f \f n \ are bounded (Monotone Convergence Theo- 
rem), 

(b) | /« | < 3 e L 1 (A) (Dominated Convergence Theorem), 

(c) J E f n converges for all measurable sets E (Vitali’s theorem). 

8. A function has both local and global integrability properties: locally about rel, 

it may belong to some L p [x — (S, + <5] space, while globally, the sequence of 

numbers a n := H/H^pp, „ + i] may belong to l q . For example, / is in L 1 (R) when 
it is locally in L l and globally in l 1 . Tf are spaces of functions that are only 
locally in L p . 

9. The Fourier series maps T : L p [ 0, 1] -> l 1 ’’ for I < p < 2 (see Exer- 
cise 10.35(14) for p = 2). 


Chapter 10 

Hilbert Spaces 


10.1 Inner Products 

There are spaces, such as t 2 , whose norms have special properties because they are 
induced from what are termed inner products. Not only do such spaces have a concept 
of length but also of orthogonality between vectors. 

Definition 10.1 


An inner product on a vector space X is a positive-definite sesquilinear form, 1 
namely a map 

( , 

such that for all x, y, z e X, A e F, 

(x, y + z) = (x , y) + (x, z), (x, Ay) = A(x, y), 

(y, x) = (x, y), (x, x) ^ 0; (x, x> = 0 <£> x = 0. 


Easy Consequences 

1. If for all x e X, (x, y) = 0, then y — 0. 

2. (x + y, z) — (x, z) + {y, z), but (Ax, y) = A(x, y) (conjugate-linear). 

3. (x, x) is real (and positive); its square-root is denoted by ||x|| := ^/(x, x). 

4. || Ax || = | A| ||x || , and ||x|| = 0 <$■ x = 0. 


1 In the mathematical literature, the inner product is often taken to be linear in the first variable; 
this is a matter of convention. The choice adopted here is that of the “physics” community; it 
makes many formulas, such as the definition x*(y) := {x,y), more natural and conforming 
with function notation. 
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5. ||x + v || 2 = || a- || 2 + 2 Re (x, y) + ||y|| 2 . 

6. (Pythagoras) If (x, y) — 0 then ||x + y|| 2 = ||x|| 2 + ||y|| 2 . 
More generally, if (x; , x j ) = 0 for i ^ j then (by induction) 

l|xi + • • • + xn II 2 = ||*i II” + • • • + Hxjvll 2 . 


We will see next that the triangle inequality is also true, making || • || a norm, thus 
inner product spaces are normed spaces. Two vectors are orthogonal ox perpendicular 
when (x, y) = 0, also written as x _L y. More generally, two subsets are said to be 
orthogonal, A _L B, when any two vectors, a e A,b e B, are orthogonal, (a, h) = 0. 

Examples 10.2 

1. The simplest examples are the Euclidean spaces and C /V with 


at \ 

! b \ \ 

ajv ) 

\ b N / 


}:=(«i ■■■on) 


/*! ^ 
\ b N) 


N 

^ , Qn b n • 
n=l 


More generally, take any basis v\, ... ,vn of IK' V , expand any two vectors x and 
y as x = X)Li a nVn, y = Hn=i b nV n , and define (x, y) := ^=1 b n b n- (The 
inner product differs depending on the choice of the basis.) 

2. The matrices of size M x N have an inner product given by 


M N 

: =ZZ 

i = 1 7=1 

3. ► (? has the inner product ((«„), (b n )) := o„b„. The fact that this series 

converges follows from Cauchy’s inequality \Tn b nb n \ < II (fln) || || (b n ) || . 

4. ► L 2 (A) has the inner product (/, g) := f A fg. That this integral has a finite 
value follows from Holder’s inequality \J A fg\ < \\fg\\ L i < II /II l 2 II 9 II L 2 - 

5. The weighted i 2 and Lr spaces generalize these formulae to 


(( a,i ), (b n )) ■= Z a n b nW n , 

n 


(/. 9) ■= 


f(x)g(x)w(x) dx 


respectively, where w n and w(x) are called weights', what properties do they 
need to have for the inner product axioms to hold? 

Our first proposition generalizes Cauchy’s inequality (Proposition 7.4) from l 2 to 
a general inner product space. It is probably the most used inequality in analysis. 
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Proposition 10.3 Cauchy-Schwarz inequality 

K*> a) I < 11*11 IIaII 


Proof The inequality need only be shown for y non-zero. Any other vector x can 
be decomposed uniquely into two parts, one in the direction of y, and the other 
perpendicular to it: 

x — Ay + (x — Ay), with (. x — Ay, y) = 0. 

This yields A = (y, x)/(y, y). Applying Pythago- 
ras’ theorem, we deduce that 

ll*ll 2 = l|Ay|| 2 + ||x-Ay|| 2 , 

hence || Ay |j ^ ||x||,or|A| ^ II* 11/ II y II, from which 
follows the assertion. □ 

Corollary 10.4 

II* + a II < 11*11 + IIaII 



Proof Using the Cauchy-Schwarz inequality, Re (x, y) f \{x, y)| ^ ||x|| ||y||, so 

II* + All 2 = ll*H 2 + 2 Re(x,y> + ||y|| 2 < ||*|| 2 + 2||x|| ||y|| + ||y|| 2 = (||*|| + ||y||) 2 . 

□ 

Hence || • || is a norm, and all the facts about normed spaces apply to inner product 
spaces. For example, the norm is continuous. 

Proposition 10.5 

The inner product is continuous. 


Proof Let x n — »■ x and y„ -* y, then since y n are bounded (Example 4.3(5)), 

I (*n > An) - (*, A)l < l(*n - *, An) I + l(*, An - A)| 

< II *n -* II II An II + 11*11 II An “A II 

-> 0 . 

It follows that taking limits commutes with the inner product: 


lim (* n , y n ) = < lim *n. lim An)- 

n — »oo n—>oo n — > oo 
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David Hilbert (1862-1943) studied invariant theory under Lin- 
demann at Konigsberg until 1885. His encyclopedic powers 
motivated him to explore much of mathematics; in 1899, in 
Gottingen, he gave rigorous axioms for Euclidean geometry; 
1904-9, he studied Fredholm’s integral equations, with his stu- 
dent Schmidt; he defined compact operators, proving they are 
limits of matrices, with their spectrum of eigenvalues; (Schmidt) 
defined i 2 with its inner product. On to mathematical physics, 
quite possibly he inspired Einstein’s general relativity. His 1918 
‘formalist’ research programme set out to prove that set axioms 
are consistent, “one can solve any problem by pure thought" . 


Fig. 10.1 Hilbert 


Definition 10.6 


A Hilbert space is an inner product space which is complete as a metric space. 


In the rest of the text, the letter H denotes a Hilbert space. 

Examples 10.7 

1. R N , C N , l 2 and L 2 (R) are all Hilbert spaces (Theorem 8.22, 9.8). 

2. Every inner product space can be completed to a Hilbert space. In the completion 
as a normed space (Proposition 7.17), take (x, y) := lim (x„ , y n ), for repre- 

n — > oo 

sentative Cauchy sequences x = \x n ], y = [ y n ]. Note that (x n , y n ) is a Cauchy 
sequence in C since 


\{Xn->yn) {Xm , Vm ) ^ {x'n , V„ ) {x m , y n ) | T* | (x m , y n ) (x m . X m ) 

^ \\%ii %m II II yn II + ll^/« II II V/2 ym II ^ 0 


as n, m -+ oo, with ||x m ||, ||y„|| bounded. 

3. ► For an inner product space over C, if (x, Tx) = 0 for all v e X, then 7' = 0. 
Proof The identities 

0 = (x + y, T(x + y)) = (x, Ty) + (y, Tx), 

0 = {x + iy, T(x + iy)) = i (x, Ty) — i (y, T x), 

together imply (x, Ty) = 0, for any x, y e X, in particular ||ry|| 2 = 0. 

4. An alternative proof of the Cauchy-Schwarz inequality is 

0< ||w — A v || 2 = 1 -2ReA( M ,u) + |A| 2 
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for u := jc/|| jc || , v := y/||y|| unit vectors and all A e F, in particular for 
A = | (u, v)\/{u, v). 

5. || jc || = sup | (at, y)|, with the maximum achieved when y = x/||x||. 

II v||=i 

Do all norms on vector spaces come from inner products, and if not, which property 
characterizes inner product spaces? The answer is given by 

Proposition 10.8 Parallelogram law 


A norm is induced from an inner product if, and only if, it satisiies, for all 
vectors x, y, 


x + y II 2 + II* ~ All 2 = 2(||x|| 2 + ||y|| 2 ). 


The statement asserts that the sum of the lengths 
squared of the diagonals of a parallelogram equals 
that of the sides. 



Proof The parallelogram law follows from adding the identities, 

ll* + All 2 = IWI 2 + 2Re {x, y) + ||y|| 2 , 
ll*-All 2 = IW| 2 -2Re <x, y) + ||y|| 2 . 

Subtracting the two gives 4 Re (x , y) . This is already sufficient to identify the inner 
product when the scalar field is 1R. Over C, notice that Im(x, y) = — Re i (x, y) = 
Re (ix, y), so 

(x,y) = ^ (ll.y + ^n 2 - lly -x|| 2 + i\\y + ix\\ 2 - r II v - ix|| 2 ^ . (10.1) 

This remarkable polarization identity expresses the inner product purely in terms of 
norms. Accordingly, for the converse of the proposition, define 

for any normed space, {{x,y)) := j(||y + x|| 2 — ||y — x|| 2 ), 
for a complex space, (x, y) := ((x, y)) + i{{ix, y)). 

Two of the inner product axioms follow from {{y, x)) = {{x,y)) and (x,x) = 
{{x,x)) — \\x\\ 2 , as well as (x,0) = ((x,0» = 0; {y,x) = {x,y) is readily veri- 
fied using 

4 «!>, x)) = \\x + iy\\ 2 - \\x - iy\\ 2 = ||v - ix|| 2 - ||y + ix|| 2 = -4{{ix, y)). 
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Showing that linearity holds when the parallelogram law is satisfied is the hardest 
part of the proof. Writing 

2y ± x = (y + z ± x) + (y - z), 

2z ± x = (y + z ± x) - (y — z), 

and using the parallelogram law, 

4((x, 2y)) + 4 <{x, 2 z)) = ||2 y + x\\ 2 - ||2y - x\\ 2 + ||2z + x\\ 2 - \\2 z - x\\ 2 

= ||2 y + x\\ 2 + \\2z + x\\ 2 - \\2y - x\\ 2 - \\2 z - x\\ 2 

= 2 1| v + z + x\\ 2 + 2||y - z|| 2 - 2||y + z - x\\ 2 - 2\\ y - z || 2 

= 8 ((x,y + z)). 

In particular, putting z = 0 gives ((x, 2y)) = 2((x, y)), reducing the above identity to 
({ x,y + z)) = ((x,y)) + ((x,z )). (10.2) 

By induction, it follows that ({x, ny )) = n((x, y)) forn e N. For the negative integers, 

{{x,-y)) = ll-y + ^ll 2 - ll-y-^ll 2 = ~{{x,y)) 

while for rational numbers p — m/n , m, n e Z, « / 0, 

««*. " y)) = ((x, my)) = m{{x , y)) 

so ((x, py)) — p{{x, y)). Note that ((x, y)) is continuous in x and y since the norm is 
continuous, so if the rational numbers p n a e R, then 

{(x,ay)) = lim ((x, p n y)) = lim p n {(x, y)) = a{{x, y)). 

n—>o o n—too 

This completes the proof when the scalar field is K. Over the complex numbers, 
(x, Ay) = A(x, y) for A e C is evident from (10.1), (10.2), and 

(x, iy) = -{{ix, y)) + i ((x, y)) = i (x, y). n 

In a sense, it is the presence of orthogonality that distinguishes inner product 
spaces from normed ones. By the polarization identity, two vectors are perpendicular 
when \\x + y|| = ||x — y|| and ||x + iy || = ||x — iy||. Each vector, and more gener- 
ally each subspace, is complemented by a subspace of those vectors that are perpen- 
dicular to it. 
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Proposition 10.9 Properties of Orthogonal Spaces 

The orthogonal spaces of subsets A c. X, 

A 1 := { x e X : (x, a) = 0, Va e A }, 

satisfy 

(i) AHA 1 CO, 

(ii) A c B c A- 1 , and A c A 2 - 1 - 

(iii) A 1 is a closed subspace of X, 

(iv) A x = fM ± - 

Proof (i) If a vector a e A is also in A- 1 , then it is orthogonal to all vectors in A, 
including itself, (a, a) = 0, so a = 0. 

(ii) If a e A c B and x e B^~, then ( x , a) = 0, so x e A 2 -. For any a e A and 
x e A 1 -, (a, x) = (, x , a) = 0, so a e A ±x . 

(iii) If x and y are in /l and a e A, then 


(Ax, a) = A(x, a) =0, (x + y, a) = (x, a) + (y, a) = 0, 

so Ax,x + y € A 1 - . If x„ e A 1 - and x„ x, then 0 = (x n ,a) (x,a), and 
x e A -1 . 

(iv) That HA]]" 1 C A -1 follows from A C HAJ. Conversely, let x e A -1 ; for any 
a,b e A, 


(x,a + b) = (x, a) + (x, b) = 0, (x, Aa) = A(x, a) =0, 

so x is orthogonal to the space generated by A, x e [A]] 1 . Let a n — > y with 
a„ e [A]], then 0 = (x, a„) — >■ (x, y) and x e [A]]" 1 . □ 

Exercises 10.10 

1. If T, S : X — >■ y are linear maps on inner product spaces such that (y, Tx) = 
(y, Sx) for all x e I, y e Y, then T = S. Example 10.7(3) is false for real 
spaces: Find a non-zero 2x2 real matrix T such that (x, Tx) = 0 for all 

X € R 2 . 

2. The Cauchy-Schwarz inequality becomes an equality if, and only if, x = Ay 
for some scalar A (or y = 0). Similarly, ||x + y|| = ||x|| + ||y|| precisely when 
x = Ay, A ^ 0. More generally, || ]T (i x n || = X/i II x n II if, and only if, x„ = \„x 
for some A„ L 0. 
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3. For any x,y e H, find scalars |A| = 1 = \fi\ such that ||x|| 2 + ||_y|| 2 ^ 
\\Xx + ny\\ 2 . 

4. A vector space may have various inner products. When T : X — > X is 1-1 
and linear, ((x, y)) := (Tx, T y) is another legitimate inner product on X. What 
properties does S need to have to ensure that ((x, y)) := (x, Sy) is also an inner 
product? 

5. * Every inner product on R w is of the type (jc, Ay) = JW Ajjaibj where A is 
a positive symmetric matrix. Deduce that balls have the shape of an ellipse in 
K 2 , and of an ellipsoid in K 3 . 

6. ► The product of two inner product spaces, X x Y, has an inner product defined 
by 

{ (vl)’(y2) >:= {xi ’ X2)x + {yuy2}¥ - 

Then the maps x i->- (q) and y i->- (,'] embed X and Y as orthogonal subspaces 
of X x Y. Although the induced norm is not the same one we defined for X x Y 
as normed spaces (Example 7.3(8)), the two norms are equivalent. 

When X, Y are complete, so is X x Y with the induced norm (note that 



7. In any inner product space, 

(a) || jc - v || 2 + || jc + y- 2z|| 2 = 2||x - z|| 2 + 2||y - z|| 2 . 

(b) ||x + v + z|| 2 + || x + y- z|| 2 + || x-y + z|| 2 + \\x-y- z|| 2 
— 4 ( 1 1 x 1 1 2 + ||y|| 2 + ||z|| 2 ). 

8. Verify that the norms for !? and L 2 (R) satisfy the parallelogram law, and show 
that the inner product obtained from the polarization identity is the same one 
defined previously (Examples 10.2(3, 4)). 

9. The 1-norm and oo-norm defined on M 2 do not come from inner products. Find 
two vectors that do not satisfy the parallelogram law. 

10. ► Similarly, L 1 (R) and L°°(R) are not inner product spaces. Neither is 

B(X, Y) in general. 

11. A norm || ■ || that satisfies the parallelogram law gives rise to its associated inner 
product, by the polarization identity. In turn, this inner product induces the norm 
HI x HI := .,/(■+ x )■ Show that the two norms are identical. 

12. The polynomials x and 2x 2 — 1 are orthogonal in L 2 [0, 1]. So are sine and cosine 
in the space L 2 [— n, 7r] ; can you find a function orthogonal to both? 

13. O 2 = X, X 1 - = 0. In fact. A 1 - = X o A c { 0 }. Do you think it is true that 
A 2- = 0 A = X? What if A is a closed linear subspace of X? 
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14. Show that (a) (A + B )- L = A 1 - n B 2 -, (b) A 2 - 2 - 2 - = A- 1 . (Hint: use property (ii) 
of the proposition.) 

15. Let cl := d(x, dy]]) = infx ||x + Ay||, where y is a unit vector; show that (a) 
d — H* + Aoy|| for some Ao, (b) |(x, y)| 2 = ||x|| 2 —d 2 , and (c) y _L (x — Aoy). 

16. To illustrate the strength of orthogonality, prove that ii M A N are orthogonal 
complete subspaces of X, then M + N is also complete (Example 7.1 1(2)). 

17. Suppose a vector space X satisfies all the axioms for an inner product space 
except that it contains non-zero vectors with (x, x) = 0. Show that if (x, x) = 0, 
then Vy, (x, y) = 0 (Hint: expand ||y — Ax|| 2 ). 

Deduce that Pythagoras’ theorem and Cauchy-Schwarz’s inequality remain 
valid. Show that Z := {x : (x, x) = 0} is a closed linear subspace, and that 
there is a well-defined inner product on X/Z, (x + Z, y + Z) := (x, y). 

18. A light ‘ray’ has a frequency profile Oversimplifying slightly, our eyes 

convert it to a color vector ((r, /), ( g , /), ( b , /}) where r(u>), g(u>), b{uf) are 
the absorption profiles of the retinal cones. So any two points (rays) in the coset 
/ + II g , have the same color. 


10.2 Least Squares Approximation 

By Exercise 10.10(15) above, the distance between a point and a line can be min- 
imized by a unique point on the line. This has a generalization with far-reaching 
consequences: 

Theorem 10.11 


If M is a closed convex subset of a Hilbert space H, then any point in H 
has a unique point in M which is closest to it, 

V* e //, 3!y* e M, Vy e M, ||x — y*|| < ||x — _y||. 


Proof Let d d{x, M) = inf VS M \\x — y|| be the smallest distance from M to x. 
Then there is a sequence of vectors y n e M such that ||x — y„|| —> d. Now, using 
the parallelogram law and the convexity of M, (y„) is a Cauchy sequence, 

II ym II” = 2 1| y n x || + 2 1| y m x || || (y H -}- y m ) 2x ||“ 

on ii 2 i on n 2 ,11 Vr I Vm 

— 2||y„ — x || + 2||y m — x|| — 4|| 

< 2||y„ - x|| 2 + 2||y m - x|| 2 - 4 dr 

—>■ 0, as n, m — >■ oo. 


2 
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But M is closed (hence complete) and so y n -» y* e M . It follows, by continuity of 
the norm, that ||x — y*|| = lim ||x — y n \\ = cl. 

n — >oo 

Suppose a e Mis another closest point to x, i.e., ||x — a|| — d. Then y* = a 
since 


|| v* - fl|| 2 = 2||y* - x|| 2 + 2||a - x || 2 - || (y* + a) - 2x|| 2 
< 2||y* — x|| 2 + 2||a — x || 2 — 4d 2 
= 0 . □ 

Let us concentrate on the special case when M is a closed subspace of H. 

Theorem 10.12 


When M is a closed linear subspace of a Hilbert space H, then y e M is 
the closest point y* to x e H if, and only if, 

x — y e M L . 

The map P : x i-^ y* is a continuous ‘orthogonal’ projection with im P = 
M orthogonal to kerP = M 1 , so 

H = M © M L . 


Proof (i) Let a be any non-zero point of M and let 
b := x — (y*+Aa) where A is chosen so that a _L Z?, that 
is, A := (a, x — v*)/||a|| 2 . By Pythagoras’ theorem, 
we get 

II* - A*ll 2 = II* + A«|| 2 = ||fo|| 2 + ||Afl|| 2 > ||fc|| 2 

making y* + A a even closer to x than the closest point 
y*, unless A = 0, i.e., (a, x — y*) = 0. Since a is 
arbitrary, this gives x — y* 1 M. 

Conversely, if (x — y) _L a' for any a' e M, then (x — y) _L ( a 1 — v) and 
Pythagoras’ theorem implies 



M 


x — a 


= II* - v|| 2 + || y - a ' || 


so that ||x — y || ^ ||x — a ' ||, making y the closest point in M to x. 

(ii) By the above, for any x e //, P(x) is that unique vector in M such that x — P(x) e 
M 1 -. This characteristic property has the following consequences: 
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• P is linear since 

(x + y) — ( Px + Py ) = (x — Px) + (y — Py) e M 2- , Px + Py e M, 

hence P(x + y) = Px + Py. Similarly, P{ Ax) = A Px. 

• The closest point in M to a e M is a itself, i.e., Pa = a, so im P = M. Since 
Px e M, it also follows that P 2 x = Px, and P 2 — P. 

• When x e M 2 -, then x — 0 e M 2 - and 0 e M so Px — 0. As Px = 0 implies 
x = x — Px e M 2 -, this justifies kerP = M 2 -. 

• P is continuous since ||x|| 2 = ||x — Px|| 2 + ||Px|| 2 by Pythagoras’ theorem so 
that ||Px|| ^ ||x || . 

Finally H — im P © kerP = M © M L , since any vector can be decomposed as 
x = Px + (x — Px), and M IT M 2 - = 0. □ 

Corollary 10.13 

For any subset Ac//, A 2 - 1 - = |]A]|. 


Proof Let M be a closed linear subspace of a Hilbert space H. By Proposition 10.9, 
M C M 2 - 2 -, so we require the opposite inclusion. Let x e M 2 L , then x = a + b 
where a e M and b e M ; , and 


0 = (b,x) = (b,a) + (b,b) = ||h|| 2 , 



forcing b = 0 and x e M; thus M 2 L C M. In particular, A ±J - 


= ttAJ.n 


Note that M 2 - = 0 M — M 2 - 2 - = 0 2 - — H, answering Exercise 10.10(13) in 
the case of a closed linear subspace of a Hilbert space. 

Examples 10.14 

1. Let M := { / e P 2 [0, 1] : jJ / = 0 }. To find that function /o in M which most 
closely approximates a given function g, we first note 

M = {/eL 2 [0, 1]: (1,/>=0} = {1}\ so M^ = m. 

Then /o must satisfy /o e M and g — /o e M 2 -, i.e., /o = g + X and 
0 = Jo fo = fo 9+ A, hence /o = g - f 0 ' g. 

2. The “affine” projection onto a plane with equation x ■ n — d (n a unit vector) is 
given by P(x) := x + (d — x ■ n)tt. 



182 


10 Hilbert Spaces 


Proof Translate all points x i— >• y := x — dn, so that the plane becomes the 
subspace M with equation y ■ n = 0, i.e., M = { n J 2- . The required point 
satisfies (y — y 0 ) ■ y — 0 for all y e M, so y 0 — y + an. Dotting with n implies 
a = —y ■ n = d — x ■ n, which can be substituted into xo = x + an. 

3. A projection is orthogonal if, and only if, || P|| = 1 (unless P = 0). 

Proof Using (x — Px, Px) = 0 and the Cauchy-Schwarz inequality. 


Px|| 2 = (x, Px) < ||x||||Px||, 


so ||Px|| ^ ||x || ; but Px = x for x e imP, so ||P|| = 1. Conversely, let 
a e ker P, b e ini P: then for any A, 

||fc|| 2 = \\P(Xa + b)\\ 2 ^ \\Xa + b\\ 2 = |A| 2 ||a|| 2 + 2Re X(b,a)+ \\b\\ 2 

and after letting A = \X\e l ® with |A| — > 0, we find Re e' e (b, a) ^ 0 for any 9, 
hence (b, a ) = 0. 

4. ► HAJ is dense in H if, and only if, A 2 - = 0. 

Proof If A 2 - = 0, then UAJ = .4 1 1 = 0 2 - = H. Conversely, if A is dense in H, 
then A- 1 = flAf 1 = H 1 - = 0. 

Application: Least Squares Approximation 

A common problem in mathematical applications is to approximate a generic vector 
x by one which is more easily handled, such as a linear combination of simpler 
vectors yi, . . . , y ; y . For Hilbert spaces, there is a guarantee that a unique closest 
approximation exists, and this lies at the heart of the method of least squares. 

Let M Hyi, . . . , vat]], a closed linear subspace of H\ then the closest point 
in M to x is y* = X/=i suc h th at x — y* _L M. Since M is generated by 
yi , . . . , yy, this is equivalent to 


{yi,x - y*> = 0, i' = l N, 


N 



These N linear equations in the N unknowns a \, . . . , a y , can he recast in matrix form. 



Given x, the coefficients a, can be found by solving these equations. The Gram 
matrix [ (y, , y ;}], and possibly its inverse, need only be calculated once, and used to 
approximate other points. 


10.2 Least Squares Approximation 
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Example The space of cubic polynomials, a + bx + cx 2 +dx 2 , is a four-dimensional 
closed linear subspace of the Hilbert space L 2 [ 0, 1], with basis 1, x, x 2 , x 3 . Their 
Gram matrix and inverse are given by 


/ 1 1/2 1/3 1/4 \ 

-1 

/ 16 -120 

240 -140 \ 

1/2 1/3 1/4 1/5 


-120 1200 

-2700 1680 

1/3 1/4 1/5 1/6 


240 -2700 

6480 -4200 

\ 1/4 1/5 1/6 1/7 ) 


^ — 140 1680 

-4200 2800 y 


So, to approximate the sine function by a cubic polynomial over the region [0, 1], we 
first calculate (x ‘ , sin> L 2[ 0 i]> which work out to (0.460, 0.301, 0.223, 0.177), and 
then apply the inverse of the Gram matrix to it, giving 

p(x) « -0.000253 + 1.005x - 0.0191x 2 - 0.144x 3 . 


Notice that the coefficients are close to, but not the same, as the first terms of 
the MacLaurin expansion of sine. The difference is that, whereas the MacLau- 
rin expansion is accurate at 0 and becomes progressively worse away from it, 
the L 2 -approximation balances out the ‘root-mean-square error’ throughout the 
region [0, 1], 


Exercises 10.15 

1. Find the closest point in the plane 2x + y — 3z = 0 to a point x e R 3 . (Hint: 
Find M 2 -.) 

2. Let (a) M := Uy]], or (b) M := {y where y is a unit vector. The orthogonal 
projection P which maps any point x to its closest point in Mis (a) Px = (y, x)y, 
(b) Px = x — (y, x)y. 

3. ► In the decomposition x = a + b with a e M and b e M , a and b are unique. 
Deduce that if H — M ® N, where M is a closed linear subspace and M _L N, 
then N = M 2 -. 


4. Let a + M be a coset of a closed linear subspace M. Show that there is a 
unique vector x e a+ M with smallest norm. (Hint: this is equivalent to finding 
the closest vector in M to —a.) Deduce that Riesz’s lemma (Proposition 8.20) 
continues to hold in a Hilbert space even when c = 1. 


5. 

6 . 


If M C N are both closed linear subspaces, then M © (M 2 - n N) = N. 

Let T be a square matrix, and suppose both subspaces M and M ' are T -invariant, 


so that T takes the schematic form 


(::) 


(Hint: takex = a + b, then ||rx|| z = || || 


Show that || 71 = max(|| A||, ||Z?|| 

2 +ll^ll 2 -) 
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7. ► There is a 1-1 correspondence between closed linear subspaces of a Hilbert 
space and orthogonal projections (onto them). Properties about subspaces are 
reflected as properties of the projections, e.g. if the orthogonal projections Pm 
and P N project onto M and N respectively, then 

(a) M c N <s> PmPn = Pm = Pn?m , 

(b) M _L N O P m Pn = 0 = P N P M , 

(c) N = M ± I = P M + P N , 

(d) M is T -invariant T Pm = PmT Pm, 

(e) M and M 1 - are both T -invariant T Pm = PmT, 

8. (a) Since (x, a) = ( Px , a) for any point a in a closed linear subspace M, it fol- 

lows that |(x, a) | ^ || Px || || a || with equality when a e [[Px]]. Deduce that in 
a real Hilbert space, the angle between x and a is at least cos _1 (||Px||/||x||). 

(b) Let H — M © N with M, N non-zero closed subspaces. Show that there is 

a minimum distance d > 0 between the disjoint closed sets B # n M and 

Bh fl N\ thus for any unit vectors x e M, y e N, \\x — y || ^ d > 0. 
Deduce that Re (x, y) ^ a := 1 — d 2 / 2, and hence that 

Vx € M, Vy e N, |(x,y)| < a||x||||y||. 

9. The main theorem, which does not refer to inner products, is not true in Banach 
spaces in general. 

(a) In R 2 with the 1 -norm, the vector (^ ! ) has many closest vectors in the closed 
ball Bi[0]. 

(b) In l°°, there are many sequences in cq that have the minimum distance to 

( 1 . 1 ,...). 

(c) Show that, in a normed space, the set of best approximations in a convex set 
M to a point x is convex. 

(d) * On the other hand, in £°°, the sequence 0 has no closest sequence in the 

closed convex set M := { (a„) e cq : a n / 2" = 1 }. 

10. * Consider two orthogonal projections P and Q in M v . Show that the iteration 
Jn+i : = QPy„ starting from y 0 = x converges to a point x ;f e im P fl im Q. 

11. Find 

(a) the best-fitting quadratic and cubic polynomials to the sine function in 
[0, 2 t r], 

(b) the linear combination of sin and cos which is closest to 1 — x 3 in L 2 [0, 1]. 
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The Gram matrix of vectors y [ , . . . , y N is G := A* A where the columns 
of A are y n , and the rows of A* are jA. It is invertible when y n are linearly 
independent. 

Show that in order to write a vector x as a linear combination of basis vectors 
x = XnLi a ny n > given the numbers b n := (y n , x), then one needs to solve 
the matrix equation Gey. — b. 

Given the total mass and moment of inertia of a radially symmetric planar 
object, 

M = 2it p(r)rdr = 2?r(r, p(r)) L 2 [0 R] , 

Jo 

f R 

I = 2tt / p(r )r 3 dr = 2?r(r 3 , p(r)) L i [0 R] , 

Jo 

find an estimate of p(r) as some function a + Or. 

13. The symmetric Gram matrix of a set of vectors x n e is useful in other 
contexts as well. Show how to recover 

(a) the vectors x„ from their Gram matrix, up to an isomorphism (use diago- 
nalization to find A such that A 2 = G), 

(b) the Gram matrix of the vectors from the mutual distances between vectors 
djj, and their norms r,-, 

(c) the Gram matrix from cl, ; only, assuming 'Y_ tn x n = 0. 

This is essentially what is done in the Global Positioning System, when 3 to 4 
distances obtained by time-lags from satellites are converted to a position. 


12. (a) 

(b) 

(c) 


10.3 Duality H* « H 

An inner product is a function acting on two variables. But if one input vector is 
fixed, it becomes a scalar- valued function on vectors, indeed a continuous functional 

x* : X -* F 

y ^ (x, y)- 

This is linear by the inner product axioms, while continuity follows from the Cauchy- 
Schwarz inequality |x*y| = |(jc,y)| ^ ||jc||||y||. 

Are there any other functionals besides these? Not when the space is complete: 
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Theorem 10.16 Riesz representation theorem 

Every continuous functional of a Hilbert space His of the form x* := (x, }, 

Vcp e H*, 3!x e H, <p = (x, ). 

The Riesz map 

J : H -* H* 

X !-»■ X* 

is a bijective conjugate-linear isometry. 


Proof (i) Given 6 e H* , first notice that for any z and y in H, 

(cpy)z - (4>z)y e ker <p. 

Assuming (p ^ 0, pick a unit vector z -L ker <!>\ this is possible since ker (p f if so 
(ker cp )-*- ^0. Then 

0 = (z, {fy)z - f bz)y ) = (fy) - y), 

(py = (< Pz){z , y) = (x, y), 

where x = ( <pz)z ■ To show that it is unique, suppose x is another such x, then 

Vy e H, (x — x, y) = {x, y) — (jc, y) = <py — (py — 0 x = x. 

(ii) Part (i) proves that J is onto and 1-1. Let x and y be two vectors in H. Then for 
any z e H, 


(x + y)*(z) — {x + y, z) — {x, z) + (y, z) = x*z + y*z , 
(A x)*(z) = (Ax, z) = A(x, z) — Ax*z, 

showing that (x + y)* = x* + y* and (Ax)* = Ax* (conjugate-linear). 
To see that J is isometric, note that 


ii *m I* aI I <*.>01 „ i 

||x ||^* = sup — — = sup = ||x| 

Ill'll yi to Ill'll 


using the Cauchy-Schwarz inequality, in particular with y — x. 


□ 



10.3 Duality H* & H 
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Frigyes Riesz (1880-1956) was a Hungarian mathematics profes- 
sor who proved that L 2 (R) is complete; in 1907, with E.S. Fis- 
cher, he proved that Hilbert’s f 2 space is equivalent to L 2 (R); he 
defined compact operators abstractly for more general spaces, 
including C[a,b] (1918); he introduced the resolvent projection 
to part of the spectrum and thus f(T) for compact operators. 

Fig. 10.2 Riesz 
Examples 10.17 

1 . The dual space of R is (isomorphic to) R itself. Any <j> : R — > R that is linear 
must be of type <j>(t) = A t where AeR. 

2. Functionals are simply row vectors when H — C N ; thus H* is isometric to C iV 
and is generated by the dual basis ej, . . . , ej,. 

Proof Let ei, . . ., e,y, be the standard basis for C ,v . Then every functional 0 in 
(C w )* j s 0 p ^ t yp e ^ _ (b n ) T , where b n := <pe n . Example 8.4(3). Thus the 
map C N —> (C N )*, y i->- y T , where y T x := y ■ x, is onto; it is easily seen to 
be linear, and continuous from Cauchy’s inequality |y ■ jc| ||_yj| ||x || . In fact 
II y T \\ = Ill’ll (using x = {b n )). Note that y T = b„e i J,, and e T n e m = S nm - 

3. It was noted previously that l 2 * = l 2 and L 2 ( R)* = L 2 { R) (Exercise 9.23(3)). 
These are special cases of the Riesz correspondence. 

Exercises 10.18 

1. ► For T e B(X, Y) (Z, Y Hilbert spaces), 

|| jc || = sup |(y,x)|, || 2" || = sup |(y,r.x)|. 

Ilyll=l IFII=i=l|y|| 



2. Show that the norm of H* comes from the inner product (x*, y*)n* (y, x) H . 

3. A functional f e H* corresponds to some vector x e // ; if M is a closed linear 
subspace of H, <\> can be restricted to act on it, 0 e M*. As M is a Hilbert space 
in its own right, what vector a e M corresponds to 01 

4. A second inner product on H which satisfies |((x, y))| ^ c||x||||y|| must be of 
the type {{x, y)) = {Tx, y) = (x, Ty), where T e B{H), ||r|| ^ c. 

5 . Riesz’s theorem holds only for complete inner product spaces (it is false for, say, 
cqo C l 2 )- Where is completeness used in the proof of the theorem? 
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10.4 The Adjoint Map T* 


We now seek to find a generalization of the transpose operation on matrices. In 
finite dimensions, we have (A*v)* = v* A\ in terms of inner products, this becomes 
(A*v 1 x) = (v, Ax). In this form, it can be generalized to any Hilbert space: 

Definition 10.19 


The (Hilbert) adjoint of an operator T : X — > Y between Hilbert spaces, is 
the operator T* : Y — »■ X uniquely defined by the relation 

(T*y, x) x = ( y , Tx) Y Vx e X, y e Y. 


That T*y is uniquely defined follows from the Riesz correspondence applied to 
the functional x \—r (y, T x). Linearity and continuity of T* follow from 


{T*(y l + yi),x) = (yi + y 2 , Tx) = {y\, Tx) + (y 2 , Tx) = ( T*yi + T*y 2 ,x ) 

(T*(Ay), x) = {Ay, Tx) = A (y, Tx) = (A T*y,x) 

IIHI= sup \{T*y,x)\= sup \{y,Tx)\ = ||T|| 
lly|l=i=ll*ll ILII=i=llx|| 

The properties of the adjoint map are: 

Proposition 10.20 


(. S + T )* = S* + T* 


(AT)* = AT*, (ST)* = T*S* 


j* j j** y 


\T*t\\ || 'j 1 1| 


Proof These assertions follow from the following identities, valid for all x e X, 
y eY: 


(( S+T)*y,x ) 

{(AT)*y, x) 

(( ST)*y,x ) 

(i*y,x) 

(y, T**x) = (T**x,y) = (x, T*y) = ( T*y,x ) = (y, Tx), 


(y, (S + T)x) = (y, Sx) + {y, Tx) = ((S* + T*)y, x) 
(y, A Tx) = A (T*y, x) = (A T*y, x) 

(y, STx) = (S*y, Tx) = {T*S*y,x) 

(y, Ix ) = {y,x) 


T*T\\ = sup \(y,T*Tx)\= sup \{Ty,Tx)\ 
x,yeS x,yeS 

= sup II 7> 11117x11 = || T 

x.yeS 
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where S := { x : ||x|| = 1 }, and the equation before the last is valid by the Cauchy- 
Schwarz inequality, in particular choosing y = x. □ 

The following proposition reveals an orthogonality between subspaces of adjoint 
operators. In particular, both M and M : are T -invariant if, and only if, M is T- and 
T* -invariant. 

Proposition 10.21 


For an operator T on Hilbert spaces, 

kerT* = (im T) -1 , im T* = (keiT) -1 . 

If T e B(H) and M is a closed linear subspace of 77, 

M is T -invariant M x is T* -invariant. 


Proof The definition ( T*x , y) = ( x , T y) implies that 

x ±Ty T*x _L y, 

in particular x 1 im T O T*x 1 7 O r e ker'/’*. Consequently, ker T* = 
(im T )-*- and thus ker T — ker T** = (im 7’*)^; furthermore, 

(kerT)- 1 = (im 7’*) J " L = im T*. 

Suppose M is T -invariant, and let x e M- 1 , y e M, then (T*x, y) — (x, Ty) = 0, 
and T*x e M 1 . Conversely, if A7 L is 7’*-invariant then M' L1 - is 7’**-invariant; but 
T** = T and A7 X± = M for a closed subspace M. □ 


Unitary Operators 
Definition 10.22 


A unitary isomorphism / : X — > Y of inner product spaces is defined as a 
map which preserves all the structure of an inner product space, namely 

J is bijective (preservation of the elements) , 

J is linear (preservation of vector addition and scalar multiplication), and 

(Jx, Jy)y = { x,y) x (preservation of the inner product). 
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It is obvious that a unitary isomorphism preserves the induced norm (an isometry); 
the converse is also partly true in Hilbert spaces, because, by the polarization identity, 
the inner product can be written in terms of norms: 

Proposition 10.23 


An operator U e B (X, Y) on Hilbert spaces preserves the inner product 
when U preserves the norm, 

Vx, x e X, ( \J x , Ux) = (x, x) U*U = I 

|| C/jc || = ||;r|| Vx e X. 

U is unitary when it is also onto. 


This statement is basically saying that preserving the inner product (lengths and 
‘angles’) is equivalent to preserving lengths. 

Proof The first equivalence is trivial 

Vx, x, (x, x) = (Ux, Ux) = (x, U*Ux) 4>- U*U = I. 

In particular (taking x = x), U is isometric. The converse implication from the third 
statement to the first follows from the polarization identity (10.1), 

(Ux, Uy) = ^(||t/x + Uy\\ + •••)= ^(l|x + y || + • • • ) = (x, y). 

A superficially different proof of this last fact can be given for complex Hilbert 
spaces (Example 10.7(3)), 

Vx, (x, x) = (Ux, Ux) — (x, U*Ux) U*U = I. 

Since isometries are 1-1, we need only require in addition that it is onto for U to be 
invertible, in which case U~ l = U*. □ 

Examples 10.24 

1. The adjoint of a matrix A = [A,y] is the conjugate of its transpose, A T , since 

(x, Ay) = ^ XjAijyj = ^ (A 0 x;)y ; - = (A T x, y). 
ij U 

2. ► The adjoint of the left-shift operator (on l 2 ) is the right-shift, L* = R , since 
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oo oo 

(L*y, x) = {y. Lx) = ^ b n a n+ \ = ^ b n -\a n = { Ry , x) 

n=0 n= 1 


and R* = L** = L. 

3. The adjoint of an integral operator on L 2 (M), 


Tf(y):= J k(x,y)f(x)dx is T*g(x ) 


k(x, y)g(y ) dy. 


Proof (g, T f) = J g(y ) J k(x, y)f(x)dx dy 

= // dy djf 
= J J k(x, y)g(y) dy f(x) dx = ( T*g , /). 


4. The unitary 2 isomorphisms of R 2 are the rotations and reflections. More gen- 
erally, those of R" are the matrices whose columns are orthonormal (mutually 
orthogonal and of unit norm). 

Proof The column vectors m , of a unitary matrix U satisfy m , = U e,, where e, 
are the standard basis for R". Then, { Ui,Uj ) = (Uej, Ue /) = (e,- , e j) — S,j. 

5. ►By itself, U*U = I ensures that a linear operator U : X —> Y is isometric 
(and 1-1), but not that it is onto, that is, it is an isometric embedding of X into 

/ 01 \ 

Y. For example, the matrix I 1 0 1 embeds M 2 into R 3 . In general, UU* is not 

V 00 / 

equal to I but is a projection of Y onto im 1/ C Y. 

Proof Clearly, UU*UU* = UU* is a projection from Y to im U . It is onto 
since UU*(Ux) = Ux. 


10.5 Inverse Problems 

When an operator T : X -> Y is not onto, the equation Tx = y need not have a 
solution. The next best thing to ask for is a vector x which minimizes || Tx — y || . 


2 More properly called orthogonal isomorphisms when the space is real. 
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Proposition 10.25 


For an operator T : H\ -»• H 2 between Hilbert spaces and a vector 
y e H 2 , a vector x e Hi minimizes | Tx — v| if, and only if 

T*Tx = T*y. 


Proof Suppose T e B(X, Y), and consider the closed linear subspace M := im T C 
Y. For each y e Y, there is a unique vector y* e M which is closest to it. As proved in 
Theorem 10.12, a necessary and sufficient condition for y* i s y — y* e M L = kerT* 
(Proposition 10.21), that is, T*y* = T*y. 

If y* happens to be in im T, i.e., y* = Tx, then the equation becomes T*Tx = 
T*y; this can only occur when y e im T © (im 7’)- L , a dense subspace of Y . When 
im T is closed, e.g. in finite dimensions, this is the case for all y e Y . 

If y* f im T then we can only conclude that there is some sequence of vectors 
x„ e X such that Tx„ -» y*, and so T*Tx n -» T*y. Thus ||Tx„ — y|| converges to 
|| y* — y || , but is never equal to it (by uniqueness of y*). □ 

To continue this discussion, the above situation in the case of finite dimensions 
is typical of an overdetennined system of equations, that is, a system Tx — b that 
represents more equations than there are unknowns. The least squares solution is 
then found to be 

x = (T*T)~ l T*b 

at least in the generic case when T is 1-1. Then T*T is also 1-1 since T*Tx — 0 
||Tx|| 2 = (x, T*Tx) = 0 O x — 0, so it is invertible at least on im T*. 

The dual problem is that of an underdetermined system of equations, Tx — b. 
where there are less equations than unknowns. There is an oversupply of solutions, 
namely any vector in xq + ker T, where xo is any single solution of the equation, and 
l<er(7’*7’ ) = ker7’ f 0. In this case, a unique x that is closest to 0 can be selected from 
all these solutions, i.e., has the least norm. That is, we seek x e (ker 7’) L = im T* (in 
finite dimensions, every subspace is closed). Thus x = T*y and b = Tx — TT*y, 
so the required least norm vector is 

x = T*(TT*)~ l b. 

In the general case, an operator need be neither 1-1 nor onto, so the set of vectors 
which minimize || Tx — y || is a coset, x + kerT. But since kerT is a closed subspace, 
it has a unique vector with smallest norm. The mapping from y to this x e ker7’ 1 
is then well-defined for y e im T + im T L and is denoted by T ' , called the Moore- 
Penrose pseudo-inverse. To recap. 
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7^ : im T + im T 1 c Y -> X, 

ji->i where T*Tx = T*y, x e ker T 2 -. 

In the simple case when T is invertible, so im T = Y , it reduces to the usual inverse 
T : = ( t*T)~ 1 T * = T~'. For example, every m x n matrix and vector has a 
pseudo-inverse, e.g. x f = x*/||x|| 2 , so thatx^x = 1 (except that 0’ = 0). 



The equations introduced above have found an extremely fertile scope for appli- 
cations. In many scientific or engineering contexts, an abundant number of measure- 
ments of a few variables in general gives an overdetermined system of equations. 
This also occurs when there is loss of information during measurement, so that the 
‘space of measurements’ (im T) is a proper subspace of the space of variables (77). 
A small sample of applications is given below: 


Regression 

To find the best-fitting (least-squares) line y = mx + c to N given points e R 2 , 
minimizing the errors in y n , we require that mx„ + c be collectively as close to y„ 
as possible. In matrix form, we require 



written as Am = b. As this usually has no exact solution, the best alternative is 
A* Am = A*b, 



Solving for m = (” ! ) gives the usual regression line as used in statistics. 

This technique is not at all restricted to fitting straight lines. Suppose it is required 
to approximate data points (y") by a quadratic polynomial a + bx + cx 2 . This is the 
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same as trying to solve the matrix equation 


/ 1 -VI \ 

/ \ 

In \ 

1 X2 X2 

a \ 

yi 


M = 


\ \ x N X 2 N ) 

V 

\>’n) 


Repeating the above procedure gives the solution 

/ a\ x / (S5-S1S3) Zn ^>n + (StS 4 -S 2 S 3 ) Zn + Vn \ 

{ 6 ) = — ( (SoS3-S l S2)Xn4yn+(Sl-SoS4)'£nXnyn+(SlS4-S 2 S 3 )'Z„yn ) 

\C J \ (Sj-S 0 S 2 ) Zn X&„ + (SoSl-SlS2) Zn Wn + ^l-Sl S3) Zn / 

where 5* = x*, and A = S 2 — 2S1S2S3 + S 2 S4 — S0S2S4 + Sq .S' 2 . (Note: In 

practice, one does not need to program these formulae; multiplying out T*T as a 
numerical matrix and solving T*Tx — T*b directly is usually a better option.) 


Tikhonov Regularization 

The Moore-Penrose pseudo-inverse is usually either not a continuous operator or 
has a large condition number; its solutions tend to fluctuate with slight changes in 
the data (e.g. errors). To address this deficiency, a number of different regularization 
techniques are employed whose aim is to improve the ill-conditioning. One of the 
more popular techniques is attributed to Tikhonov; it balances out finding the best 
approximate solution of T x — y with x having a small norm by seeking the minimum 
of || 7* — y|| 2 + a||x|| 2 , where a > 0 is some pre-determined parameter. 

To solve this minimization problem, consider the following more general formu- 
lation: Let H be a real Hilbert space and suppose A e B(H), b e H, and c e R; to 
find the minimum of the quadratic function q : H — > M, 

q(x) := (x, Ax) + ( b , x) + c. 

Taking small variations of the minimum point x, namely x + tv, we deduce 

Vr e R,V» e H , q(x) ^ q(x + tv) — (x + tv, Ax + tAv) + (b, x + tv) + c 
0 < t(v, Ax + A*x + b) + t 2 (v, Av), 

Vr > 0, —t( v, Av) < (v. Ax + A*x + b) ^ t{ v, Av). 

As t and v are arbitrary, it must be the case that x satisfies 


(A + A*)x + b = 0. 
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In particular, minimizing || Tx — y|| 2 = (x, T*Tx) — 2(T*y, x) + ||y|| 2 gives the 
equation inferred previously, T*Tx = T*y. Similarly, that x which minimizes 

\\Tx - y|| 2 + a||x|| 2 = (x, ( T*T + al)x) - 2 {T*y, x ) + ||y|| 2 

solves the equation 

(T*T+aI)x= T*y. 

This is the regularized version of the last proposition. It will be proved later that 
T*T + al is always invertible (regular) for a > 0 (Proposition 15.42). This gives 
an excellent alternative to the Moore-Penrose solution when y qL im T + (im T) 2 -, 
although choosing the parameter a may not be straightforward. 


Algebraic Reconstruction Technique 


ART is an iterative algorithm that generates a solution x of the (real) equation Ax = b. 
The matrix equation can be rewritten as ( a n , x) = b n ,n = 1, . . . , N, where a„ are the 
rows of A. The iteration is defined in terms of affine projections (Example 10.14(2)) 


, b n -{a n ,x n - 1 )_ 
X n — %n— 1 i 9 dn i 

ll«« II 


xq e H. 


The indices of a n and b n are to be understood as modulo N (ajv+i = a\, etc). We 
show below that starting from any xq e H, the iteration converges to the closest 
point x* to xo that is a solution of Ax* = b. Note that starting from xo = 0 results 
in the Moore-Penrose inverse. 

To see why this works, let M n := (cycling through n = 1, . . . , N), then 
M := fj n M„ contains all the solutions of Av = 0; let also v n := x„ — x*. The 
iteration becomes 


V n — Vn— 1 (U/i , V n — [)u tl — P n V n —\ G M n , 

where a n = a„/\\a n ||, and P n is the projection onto the hyperplane M n . Notice that 
no = xo — x* G M 2 -, as well as v n — t>„_i G M 2 -, so the entire sequence v n lies in 
M l . 

Consider the operator Q := P N - ■ P\ acting on M 2 - ; its norm is bounded by 1 
because || Pj || ^ 1 for each i. If 1 = || Q\\ = supn^^j || Qw\\, then the supremum 
is achieved by some unit vector w G M ' since the unit ball is compact in finite 
dimensions and w || gtc | is a continuous function. Denote wt := P, u),_i = 
Wi - 1 — (a,-, Wi-i)di, with wq := w; then 

1 = IIQm'II = ||PtVU»Ar-l|| < ||l»iv-lll < II V)N —2 II < ••• ^ II Will < ||w|| = 1 
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forces all w, to have norm 1. But, since || 2 = ||u), || 2 + |{«; , iu,_i)| 2 , it follows 
that («/, w;-i) = 0 and w, = u>,_i fori = 1, . . . , N. Hence w e M\ fl • • • n Mn = 
M, yet w e M 1 - is a unit vector. 

This contradiction implies ||<2i>|| ^ cut'll, c < 1, for any v e M ' . Hence 
||i),,+iv|| = ||0t)„|| ^ c || v n ||; combined with || || ^ ||i> n ||, we get — »■ 0. 

Equivalently, x„ converges to **. 

The advantages of ART are that it uses less computer memory and is flexible in 
that it can be used even if there is missing data or newly available data (missing or 
new rows of A); but, being an iterative procedure, it is generally slower to converge. 


Wiener Deconvolution 

When a signal / e L 2 (M) passes through a ‘circuit’ (which could be the atmosphere, 
say, or a measuring apparatus), it is modified in two ways: (i) the signal is distorted 
slightly to Kf := k * f, where k e L*(K) is characteristic of the circuit (recall 
convolution Example 8.6(5))), (ii) random noise in the process adds a little error 
e e L 2 (R) to the signal. The net effect is a distorted output signal y — k * f + e. Is 
it possible to extract the original signal / back again from y? A full reconstruction 
by solving Kf = y is impossible as lost information cannot be regained; the im K 
subspace is not the full space L 2 (R), and the error displaces the signal off this 
subspace. But one can use Tikhonov regularization and solve ( K*K + off = K*y. 
The simplest way to do this is to use the properties of the Fourier transform, which 
converts convolution to multiplication. As in Example 10.24(3), the adjoint of K is 
given by K*g = k~ * g where k~(t) := k(—t), since 


(K*g, f) = (g, Kf) = JJ g(s)k(s-t)f(t)dtds = J j k{s - t)g{ S )d S f(t)dt 
The Fourier transform of k~ is 

P(Q = [ e~ 27rii ‘k(Pt) dt = [ e 2 ^’W)dt =£(£), 


so that ( K*K + a) f — K*y transforms to 


|k| 2 + a 

This is a recipe for finding / from y, called deconvolution, that is commonly imple- 
mented as a computer program using the Fast Fourier Transform, or directly as an 
electrical filter circuit. 
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Fig. 10.3 Image reconstruction, (i) The original image, (ii) after it passes an imaging device (exag- 
gerated), (iii) the best-fit image 


Image Reconstruction 

An image can also be considered as a ‘signal’, this time in L 2 (R 2 ), or, when dis- 
cretized, as a vector of numbers in the form of an array of pixels. Each number 
represents the brightness of a pixel (neglecting the color content for simplicity). An 
imaging apparatus transforms the original image x to y — Ax + e, where A is 
assumed to be a linear operator, as above; examples include a slight spherical aber- 
ration or blurring in general. Since such modification incurs a loss of information, 
the distortion matrix A is not invertible, but the best-fit “regularized” solution of 
x = (A* A + al)~ l A*y restores the image somewhat, as seen in Fig. 10.3. 

In practice, implementing the reconstruction encounters difficulties that are spe- 
cific to images. Images are typically in the order of about a million pixels in size; 
the matrix A would therefore consist of about a trillion coefficients (most of which 
are zero), and finding the inverse of A* A + al is prohibitively time-consuming. 
Fortunately, blurring is to a good approximation usually independent of the pixel 
positions; for example, a linear motion blur produces the same streaks everywhere 
across the picture (but note that this is not true for a rotation blur). In mathemati- 
cal terms, the transformation A can be taken to be translation invariant, so that it is 
equivalent to the convolution by some vector k e H. With this simplification, image 
reconstruction becomes a 2-dimensional version of Wiener deconvolution; the same 
technique using the Fourier transform can be applied. 



\k\ 2 + a 


Here, y represents the discrete version of the Fourier transform, namely y m = 
e~ 2mmn y n . The resulting x may have negative coefficients; these are mean- 
ingless and usually replaced by 0. 


Tomography 

Suppose that instead of a vector x, one is given ‘views’ of it, y„ := {a„ . x), where 
a„ is a list of known vectors: Is it possible to reconstruct x from these views? If 
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Fig. 10.4 Computed tomography, (i) The original image (360 x 360 pixels), (ii) 80 parallel ‘views’ 
of the object, (iii) the best-fit reconstruction from 6,400 views (80 directions) 


a n are assembled as rows of a matrix A, one obtains a matrix equation Ax — y. 
In such problems, it may be the case that the number of views is less than the 
dimension of the vector space, so that the system is under-determined, or that there 
are a large number of views, making the equation over-determined. In either case, a 
least-squares solution can be found as above, using the techniques of inverse problem 
solving (Fig. 10.4). 

CT scans : An x-ray passing through a 3-D object of density / diminishes in 
intensity by an amount e l f (a + b, '> dr where a + bt is the straight line followed by the 
ray. The emitted and received intensity can be measured and, after taking logs, one 
obtains a ‘view’ of the object 


y = J f(a + bt)dt = {L a b , />, 

where L a j, is the characteristic function of the ray, i.e., a function that is 1 along the 
ray and 0 outside it (in practice, the ray has a finite width). It should be possible to 
reconstruct / from a large number of these views. A CT-scan does precisely this: an 
x-ray source coupled with a detector rotate around the object to produce these views. 

In one simple configuration, b = ( C ° S f ) and a = s ( S ' n ^ ) : the collection 

\ sin 0 ) \ cos 0 ) 

of these views, as a function of 6 and s, is called the Radon transform R of /. The 
best-fit / that reproduces the data is computed by solving (R*R + ot)f = R*y. either 
directly in the form of the optimized Filtered Back Projection (FBP) algorithm or 
by iterative algorithms such as some variants of ART. Other configurations include 
a fixed source and a rotating detector, producing a fan-shaped collection of rays. 
In yet other applications, the ‘rays’ move along curved lines; more generally, the 
output may depend non-linearly on / and the source (see [21] for an overview of 
tomography and inverse scattering theory). 

The idea obviously has lots of potential: x-ray tomography has revolutionized 
medical diagnosis, archaeology, and fossil analysis; crystal x-ray diffraction 
tomography recreates the atomic configuration of molecules in a lattice; impedance 
tomography takes output currents from input voltages to reconstruct the interior 
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resistance density of an object; seismographs measure the output vibrations after the 
occurrence of earthquakes to reconstruct the interior density of the Earth; gravity, 
magnetic, or sound measurements at the Earth’s surface can determine rock densi- 
ties underneath, aiding in the exploration for oil or minerals; ultrasound echoes or 
scattered light can be used to reconstruct 3-D images of internal organs (or of moths 
and fish/squid by bats and dolphins). The list is long and increasing! 

Exercises 10.26 

1. If T is invertible then (T -1 )* = (T*) _l . 

2. Use ||r*7’|| = ||r|| 2 to show ||r*|| = ||r||. 

3. ► The adjoint of the multiplier operator in £ 2 , x i->- ax, is y i-»- ay. 

4. Let a e l 1 (Z), then Young’s inequality (Exercise 9.15(5)) shows that the linear 
map x i — r a * x is continuous on l 2 CL). Its adjoint is y i->- a ' * y where 
(a n )^ .= (G— n ). 

5. The Volterra operator on L 2 [0, 1], V f(x) := f 0 ' /,hasadjoint V* f(x) = fj f. 

6. Let ((x, >’)) := (x, Ay) be a new inner product, then the adjoint of T with respect 
to it is T* := (ATA~ 1 )*. 

7. If R e B(X, Y) then T i-> RTR* is an operator B(X) -> B(Y). 

8. For any T e B(H\, H 2 ), ker(7’*7’) = kerT andim7’*7’ = imT*. 

9. A linear map T : X — > Y is said to be conformal when it preserves orthogonality, 

V.r, x e X, (x, x) — 0 {Tx, Tx) =0. 

Show that this is the case if, and only if, T*T — XI for some A ^ 0. Moreover, 
angles between vectors are preserved (for A > 0). 

In particular, two inner products on the same vector space are conformal when 
((x, y)) = A(x, y) for some A > 0. 

10. * Show that a map between Hilbert spaces which preserves the inner product 
must be linear. Deduce that isometries on a real Hilbert space must be of the 
type f(x) = Ux + a where U*U = I and a e H. 

(Hint: Let g(x) := f(x) — /( 0), an isometry; show {g(x + y), g(z)) = 
{g(x) + g(y ), g(z)), so g(x + y) - g(x) - g{y) e [imp]] n (imp)- 1 .) 

1 1 . Find best approximate solutions for 



12. To find the best-fitting plane z — ax + by + c to a number of points (x„, y n , Z n )> 
where z n is the dependent variable, least squares approximation gives 
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/Z n [ Z n x n Z „ ^ \ / C \ / Zn \ 

Zn*" Zn x n Hn X nyn )(«) = ( Zn *"Z" • 

V Z« Zn *« Z n yl ) \ b ) VZ„ y« z„ / 

13. * The method is not at all restricted to linear geometric objects. Find the best- 
fitting circle x 2 + y 2 + ax + by = c to a number of points (x n , y„). 

14. The pseudo-inverse of the left-shift operator on l 2 is the right-shift operator, and 
vice versa. 

15. ForanyF e B(X,Y),TT^T = T, because both, x and T^Tx belong to. r+kerF. 
So 7’ 7’ and FT* are projections; which precisely? 

16. The transformation F ' : im F © im 7' 1 -> ker7’ L is linear but continuous only 
when im F is closed (Hint: if F x n — ► y then Tx n — FF 1 Tx n -» TT^y). 

17. Recall the Volterra 1-1 operator V f(x) := J* f on F 2 [0, 1], If g is differ- 
entiable, then V ' g = g’ , and the Tikhonov regularization solves the equation 
f-af" = g'. 

18. An oscillating pendulum is captured on video at 25 frames/s. The angle 6 (in 
rad) that the pendulum makes with the vertical, for 1 s worth of frames (1-26), 
is given in the table below. Theoretically, 6 satisfies 

nr -o q 

9 H 9 2 + - sin 9 = 0, 

m r 

where g — 9.81ms -2 and k/ m, and r are unknown numbers. From the data, 
estimate 9 n by (9 n+ 1 — 9 n -i)/2St, and 9 n by (6 n+ \ — 29, , + 9 n -\)/5t, thereby 
getting equations of the type ax n + by n = z n , where x n = 9 2 , y n = sin0„, 
Z n = —9 n , and a , b are unknown constants. Use regression to find a, b (hence r 
and n/m) that best fit these data. 


1 

0.372 

2 

0.210 

3 

0.043 

4 

-0.126 

5 

-0.291 

6 

-0.447 

7 

-0.589 

8 

-0.714 

9 

-0.816 

10 

-0.900 

11 

-0.957 

12 

-0.988 

13 

-0.993 

14 

-0.972 

15 

-0.923 

16 

-0.854 

17 

-0.756 

18 

-0.640 

19 

-0.505 

20 

-0.353 

21 

-0.192 

22 

-0.025 

23 

0.144 

24 

0.308 

25 

0.462 

26 

0.600 



19. Phylogeny. Bioinformaticians can create a score of how far apart two species 
are genetically. An example is given in the adjoining table, together with the 
suspected evolutionary tree. Assign constants to each edge in the tree which best 
match the given scores, i.e., the sum of the edge constants along the path from, 
say, A to D should be as close to 6.16 as possible. 
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A 

B 

C 

D 

B 

2.22 

- 



C 

6.12 

5.60 

- 


D 

6.16 

5.70 

1.70 

- 

E 

5.79 

5.06 

3.12 

3.72 



10.6 Orthonormal Bases 


Definition 10.27 


An orthonormal basis of a Hilbert space H is a set of orthonormal vectors E 
whose span is dense, 

Wet, e j e E, ( e,-, ey ) = Stj , ([E]] = H. 


The second condition is equivalent to E 1 - = 0 (Example 10.14(4)), i.e., 
Ve e E, (e, x) — 0 x = 0. 


Examples 10.28 

1. The sequences e n := (<5„, ) = (0, . . . , 0, 1,0,...) form an orthonormal basis for 

f 2 . 

Proof Orthonormality is obvious, 

= ((0, . . . , 0 , 1, 0, . . .), (0, . . . , 0 , 1, 0, . . .)> = 5 nm . 

v v 'n ^ v 'm 

If the sequence jc = (ao, a \, . . .) is in Hem ei , ■ . • J 2- , then a„ = (e„, x)p — 0 
for any n; hence x — 0. 

2. Gram-Schmidt orthogonalization : Any countable number of vectors { v n } can 
be replaced by a set of orthonormal vectors having the same span, using the 
Gram-Schmidt algorithm: 


m 0 := vo, eo := m 0 /|| m 0 || 

M n - = V n Z != 0 ’ V n )6i, e n u n /\\u n \\. 

It may very well happen that u n = 0, in which case it and v n are discarded 
and v n +i relabeled as v n . Clearly, |[ei, ■ • ■ , <?>,]] = [ui, . . . , u«]], not taking the 
discarded v n into account. Hence H<?o, e\, . . . ]] = Hpq> Pi> • • • 3- 
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3. Suppose x = ]T m a m e m for an orthonormal basis { eo, e\, e 2 , ■ • ■ }; then taking 
the inner product with e n gives the simple formula a n = (e n , x). The next section 
discusses whether every x can be so written. 

4. The set of basis vectors need not be countable; when uncountable, the Hilbert 
space is not separable, because the vectors e n are equally distant from each 
other \\e n — e m \\ = -s/2, so that the balls B e (e n ) are disjoint for e < -s/2/2 
(Exercise 4.21(4)). Conversely, if E := { e n } is a countable orthonormal basis, 
then [[£■]], and H = [[ /ij, are separable. 

5. * Every Hilbert space has an orthonormal basis. 

Proof Consider the collection of all orthonormal sets of vectors. It is nonempty, 
so Hausdorff’s maximality principle implies that there is a maximal chain of 
orthonormal sets E a . But E := [J (i E a is also an orthonormal set, for pick any 
two distinct vectors e a e E a and ep e Ep C E a , say, then e a _L ep. So E is 
a maximal set of orthonormal vectors. E 1 - = 0 otherwise E can be extended 
further, so HEJ = H. 


Fourier Expansion 

The utility of orthonormal bases lies in the ease of calculation of the inner product: 

Proposition 10.29 Parseval’s identity 

If x = ot n e„ and y = T«e„, where { e n } are orthonormal, then 

{x, y) = y,a n Pn = ^ (x, e n ){e n ,y). 

n n 

In particular, ||x|| = ( ^ | | 2 ) '"'c 

n 


Proof A simple expansion of the two series in the inner product, making essential 
use of the linearity and continuity of ( , ) as well as orthonormality, gives the result: 

{x , y) = Oi n Pm n i Cm ) = CX n (3 n . j-j 

n m n 

Parseval’s identity is the generalization of Pythagoras’ theorem to infinite dimen- 
sions. The question remains: when can a vector be written as a series of orthonormal 
vectors? The next proposition and theorem give an answer. 
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Proposition 10.30 


Let { e \ , e 2 , . . . } be a countable orthonormal set of vectors in a Hilbert 
space H, then 

OO 

converges in H ( a „ ) e l 2 . 

n = l 


Proof By Pythagoras’ theorem we have 


+ • • • + OL m c m ||“ — | oc n | T • ■ • T | Oi m |“. 

This shows that ct„e„ is a Cauchy sequence in H if and only if X»Li l a «l 2 i s 

Cauchy in C (Example 7.20(1)). Since H and l 2 are complete, a„ e„ converges 
if, and only if, (a„) is in l 2 . □ 

The convergence of o n e n need not be absolute in infinite dimensions; for 
the latter to be true requires that \\a n e n || = \a„\ converges, that is, (a„) e 

l 1 C l 2 . Nevertheless, a rearrangement a of an orthonormal basis does not affect 
the expansion, a, ,e n = '^ Jll a a (n)ecr(n), because e a ( n ) remain orthonormal and 

(t*(j(ft)) 

Theorem 10.31 Bessel’s inequality 


If { e \ , ej, ■ ■ ■ } are orthonormal in an inner product space, then 

^l(e„,x)| 2 ^ ||x|| 2 . 

n 

When { e„ } is an orthonormal basis of a Hilbert space, 

x = y,(e n ,x)e n . 


Proof (i) Fix x and let xn 
2 

0 ^ || jc — xat||“ 


Z^=i {e n ,x)e n . Writing a n := (e n ,x), we have 
2 

||x|| — (xyy, x) — (x, xn) + (x n , xn) 

N N 

Ik II 3 ^ Qn On T ^ ; O'n Oi m {c'n , C m ) 

n = 1 n,m = 1 




204 


10 Hilbert Spaces 


*ii 2 -Xi a "i 2 ’ 

n= 1 


hence 


N 

^|{e„,x)| 2 s; \\x 

n = 1 


(10.3) 


As a bounded increasing series, the left-hand side must converge as N — > oo, and 
Bessel’s inequality holds. 

As a matter of fact, even if { e, } is an uncountable orthonormal set of vectors, 
the same analysis can be made for any finite subset of them. Inequality (10.3) then 
shows that there can be at most N — 1 vectors e; with | (e,- , x }| 2 > ||x|| 2 /A, for 
any positive integer N, and so only a countable number of terms with {<?,■ , x) ^ 0. 
Therefore ^T- | (e,-, x)| 2 is in fact a countable sum, bounded above by ||x|| 2 . 

(ii) By the previous proposition, the series (e„ , x)e„ converges in a Hilbert space, 

say to y e H. But x — y e { ei, e 2 , ■ ■ ■}' L =0, since for all N e N, 

OO 

(e N ,x - y) = ( e N ,x ) - ^ {e n ,x)(e N , e„) = 0. 

n= 1 


An orthonormal basis is thus a Schauder basis. □ 

Proposition 10.32 

Every A -dimensional Hilbert space is unitarily isomorphic to R A? or C N . 
Every separable infinite-dimensional Hilbert space is unitarily isomorphic 
to l 2 (real or complex). 


Proof Suppose H is a separable Hilbert space, with some dense countable subset 
A — { a i , 02 , . . . }. The Gram-Schmidt process converts this to a list of orthonormal 
vectors E = { e \ , e 2 , . . . }, which is then a countable orthonormal basis of H since 

PI = M 2 A = H. 

Consider the map 

J : H -* l 2 

x h -* (a„), a n :=(e n ,x) 

Bessel’s inequality shows that (a n ) is indeed in l 2 (if H is a real Hilbert space, o„ 
are also real). Linearity of J follows from that of the inner product. Preservation 
of the inner products and norms, (x, y) H = (Jx, Jy)p., is precisely the content of 
Parseval’s identity. 

J is onto: for any (a n ) e l 2 , the series a„e n converges to some vector x by 
Proposition 10.30, and this is mapped by J to 
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Joseph Fourier (1768-1830) A Napoleonic supporter, almost 
guillotined in the aftermath of the French revolution, he suc- 
ceeded his teacher Lagrange in 1797. Besides being a gov- 
ernment official and an accomplished Egyptologist, his math- 
ematical work culminated in his 1822 book on Fourier series: 
“sines and cosines as the atoms of all functions” ; it revolution- 
ized how differential equations were solved. But Lagrange had 
pointed out that the expansion might not be unique, or even 
exist. Which functions have a Fourier series? This question led 
to refined treatments of integration such as Riemann’s, and to 
Cantor’s set theory; but also to studies into what convergence 
of functions is all about, when it is not pointwise. 


Fig. 10.5 Fourier 



Jx — ((c», t - x rn@ni)') — (cbz)- 

The Hilbert space is /V-dimensional precisely when E has N vectors; in this case 
it is a classical basis of H . J remains a surjective isometry, with or C iV replacing 
l 2 . □ 


Examples of Orthonormal Bases 

Orthonormal bases are widely used to approximate functions, and are indispensable 
for actual calculations. There are various orthonormal bases commonly used for the 
space of L 2 functions on different domains. Each basis has particular properties that 
are useful in specific contexts. One should treat these in the same way that one treats 
bases in finite-dimensional vector spaces — a suitable choice of basis may make a 
problem amenable. For example, for a problem that has spherical symmetry, it would 
probably make sense to use an orthonormal basis adapted to spherical symmetry. 

Consider the simplest domain, the real line. There are three different classes of 
non-empty closed intervals (up to ahomeomorphism): [a, Z>], [a, oo[, and IR. Various 
orthonormal bases have been devised for each, with the most popular being listed 
here. 

L 2 [a, b ] — Fourier series 
Proposition 10.33 


The functions e 2mnx , n e Z, form an orthonormal basis for L 2 [0, 1], 


Proof Orthonormality of the functions is trivial to establish. 


( e 2Tn '“, e 2nimx ) = f l e 2™*(m-n) ±x = §nm 
Jo 
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Suppose / e { e 2mnx J- 1 i.e., /J e 2mnx /(x) dx = 0 for all n e Z. Recall that the 
Fourier coefficients give a 1-1 operator IF : L 1 [0. 1] -» co(Z) (Theorem 9.25) (note: 
L 2 [ 0, 1] c L'[0, 1]), so Tf = 0 implies / = 0 and hence { e 2%inx :n e Z} 2 - =0. 

□ 

Of course, there is nothing special about the interval [0, 1]. Any other interval 
[a, b] has a modified Fourier basis. For example, { J e"‘ x : n e Z } is an orthonor- 

■V 27T 

mal basis for L 2 [— n, 7r]. 

Examples 10.34 

1. ► The Fourier expansion becomes, for f e L 2 [ 0, 1], 

OO 

fix) = X °^ lnx 

n=—c c 


where a n = (e 27r,nx , f) = f Q l e 2mnx f{x) dx are the Fourier coefficients of /, 
and the convergence is in Lr[ 0, 1] not necessarily pointwise. (Flowever, a diffi- 
cult proof [39] shows that there is pointwise convergence a.e.; see also Exam- 
ple 11.29(5)) 

2. The classical Parseval identity is 


/ |/(x)| 2 dx = ^ \an \ 2 + \b n \ 2 , 

‘^~ Tt n=—oo 

where a.,— ib n = —j— [" e~ ,nx f(x) dx are the L 2 [— 7r, 7r]-Fourier coefficients. 

V27T n 

3. Fourier series have a wide range of applications, especially in signal processing. 
For example, the operator ,n]F is called a low(frequency)-pass filter: 

Given a signal /, 1 [-v./Vl discards the higher-frequency terms from the Fourier 
coefficients T f\ T* then builds a function from the remaining coefficients, 
resulting in a smoothed out low frequency band signal (for example, without a 
high frequency hiss). 


L 2 [— 1, 1] — Legendre polynomials 

We’ve seen that the set of polynomials is dense in the space L 2 [a, b] (Proposi- 
tion 9.20) but the simplest basis, namely 1, x, x 2 , . . ., is not orthogonal, as can be 
easily verified by calculating, say, (1, x 2 } = (b 2 — a 2 )/ 3. This can be rectified by 
applying the Gram-Schmidt algorithm. On the interval [—1, 1], the resulting poly- 
nomials are called the (normalized) Legendre polynomials (Fig. 10.6). The first few 
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Legendre polynomials 


Laguerre functions 


Hermite functions 


Fig. 10.6 Orthonormal bases (The first ten functions of each basis are plotted as rows in each 
image; brightness is proportional to the value of the function, mid-grey being 0) 


with the general formula being 



These polynomials satisfy the differential equation 

Lp n — —n(n+ 1 )p n , where L = D( 1 — x 2 )D = (1 — x 2 )D 2 — 2xD. 

L 2 [ 0, oo[ — Laguerre functions 

This Hilbert space does not contain any polynomials x’\ but their modified versions 
x n e~ x / 2 do belong. A Gram-Schmidt orthonormalization of them gives the Laguerre 
functions, the first few terms of which are 


e~ x/2 , (l-x)e~ x/2 , (1 - 2x + ^x 2 )e~ x ' 2 , . . . 


and the general formula is 


l n (x) = -e x/2 D n (x n e~ x ). 
n\ 


The Laguerre functions satisfy (prove!) 


1 

Sl n — —(n + where S := DxD — x/4. 


The Laguerre polynomials (the polynomial part of /„) can also be thought of as an 
orthonormal basis for L^(K + ) with the weight e~ x . 

L 2 (M) — Hermite functions 
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Here, orthonormalization is performed on the functions x n e ^Z 2 (equivalently, take 

ry . _ 2 

x n in L~ w (R) with the weight e A ) to get the Hermite functions, 


1 p -x 2 /2 


V27T 1 / 4 


xe 


-x 1 /! 




’ v^TT 1 / 4 
.2 


(2x 2 - l)e~ x / 2 , 


To prove orthogonality, first show that D(e x 2 D"e x ) = —2ne x l)" 1 e * 2 , and 
deduce that ( h n , h m ) — 2n{h n -\, h m -\). The Hermite functions satisfy 

Rh n = — (2 n + l)h n , where R D 2 — x 2 . 


Other Domains 

Some other useful orthogonal bases on L 2 (A) spaces are, in brief: 

Circle L 2 (S l ): Since the circle S 1 is essentially the interval [0, 27r] as far as 
/^-functions are concerned, the periodic Fourier functions e lnS form an orthogonal 
basis for it. 

The Chebyshev polynomials, T n (cos 9) := cos n(), are the projection of the cos nO 
part of this Fourier basis, from the unit semi-circle to the x-axis [— 1 , 1] . They are thus 
orthogonal on L 2 ,[— 1, 1] with the weight 1/Vl — x 2 (since d 9 — — dx/V 1 — x 2 ). 

There are many other orthonormal bases adapted to L 2 W [<7, b]. Rodrigues’ formula 
describes orthogonal functions on L 2 [a, b], 

f n {x) := w(xy l D n (w(x)p(x) n ) 


for a quadratic polynomial p with roots at the endpoints a, b, and weight function 
w: the Legendre, Laguerre, Hermite, and Chebyshev functions are all of this type. 

Plane/. 2 ® 2 ): An orthonormal basis for the plane can be obtained by multiplying 
Hermite functions h n (x)h m ( y ). In general, if e„ (x) and e n (y) are orthonormal bases 
for L 2 (A) and L?(A), then e„(x)e m (y) form an orthonormal basis of L 2 (A x A). 

Disk L?{B\ (0)) Bessel functions: The functions on the unit disk taking the value 
zero at the boundary have an orthogonal basis J n {A m , n r)e' nS , where X mj , are the 
zeros of the Bessel function J n (x) := Z“=o m !(7i+mj! ( x/2) lm+n (Fig. 10.7). 

Sphere L 2 (S 2 ) Spherical Harmonics: 


0 ) := 


(21 + 1)(/ — m)\ i , 

P„ (cos 9)e 

4t r(/ + m)! 


where Pj n (x) = (— 1 )”' ( 1 — x 2 )'"' 2 D m Pi (x) are the “associated Legendre functions”. 
They depend on two indices, / e N and m — 
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Fig. 10.7 Bessel’s functions, 
7 n (A mi „r) cos(n9), n, m = 

0 , 1,2 


Exercises 10.35 

1 . Orthonormal vectors must be linearly independent. 

2. In finite dimensions, orthonormal bases span the vector space, H^i, . . . , = 

H (Theorem 8.22). 

In infinite dimensions, an orthonormal basis is not a basis in the linear algebra 
sense ( Hamel basis), which requires the stronger spanning condition J£]] = H . 

3. Comparing coefficients: if a n e„ = ]T )( (3„e n , then a„ = /3„. 

4. If { e„ } and { e m } are orthonormal bases for Hilbert spaces X and Y respec- 
tively, then { (e n , 0) } U { (0, e m ) ) form an orthonormal basis for X x Y (Exer- 
cise 10.10(6)). 

5. Let E := { e\, ei , . . . } be a set of orthonormal vectors, with HE]] = M C H. 
For any x e H, the sum ]T ;; (e„, x)e n gives the closest point x* in M to x. 

6. ► An operator U e B ( H \ , Hn) is a unitary isomorphism if, and only if, it maps 
orthonormal bases to orthonormal bases. 

7. * It is quite possible for x = ]T (J (e n , x)e„ to hold true for all x in a Hilbert 
space, without e n being orthonormal. Find three such vectors C| , ei, £3, in R 2 . 

But if Parseval’s identity ||x|| 2 = | (e„, x)| 2 holds for all x e H, and \\e n || = 

1 for all 77, then the vectors e n form an orthonormal basis. 

8. Expand the function x on [0, 1] as a Fourier series. 

(a) Assuming pointwise convergence, deduce Gregory’s formula 
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1 1 1 

1 - - + 

3 5 7 


7T 

4' 


(b) Use Parseval’s identity to deduce Euler’s formula 


11 7 T 2 

1 + ^2 + ^2+'--=y 

9. When/ e Lr[ 0, 1] is an even function about meaning /(^ + x) = f{\ — x), 
then a_„ = a n and 


OO CO 

^ a n e 2mnx — «o + ^ 2 ct„ cos(27tmx). 

n=—oo n= 1 


What if / is odd, or neither odd nor even? 

10. Show that cosh7tx, n = 0, 1, .... is an orthogonal basis for the real space 
L 2 [0, 1], 

11. Show that Uf(x ) := = /(|E§) is a unitary operator L 2 [0, 1] — >■ L 2 [«, /?]. 

Hence find an orthonormal basis for L 2 [a, /?]. 

12. ► The Fourier operator T : L 2 [0, 1] — »■ £ 2 is a unitary isomorphism between 

OO 

Hilbert spaces. Its adjoint is T*(a n ) = a n e 2n,nx . 

n = — oo 

13. Prove that the Legendre polynomials are orthonormal in L 2 [— 1 , 1], as follows: 
Define u n (x) := Lx 2 — 1)' ! , and q n := D"u n \ show by induction that 

(a) D k u n (±l) — 0, for A: < n, 

(b) (D n u n , D m u m ) = —(D"~ l u n , D m+1 u m ), 

(c) (q n , q m ) — 0 unless n — m. 

14. * The Legendre polynomials P n \= p n /yjn + j have the property, 

OO 

l/ll u - y|| = (cos 9) 

n = 0 

where u is a unit vector, r := ||y|| < 1, and 0 is the angle between u and y. 
(Hint: Show f r (x) := 1 /V 1 + r 2 — 2 rx satisfies Tf r = r (rf r ) , then write 
fr(x) = °tn(r)p n {x).) 

15. ► A frame is a sequence of vectors e n e H (not necessarily linearly independent) 
for which the mapping J : x i — > (( e n , jc))„ € n is an embedding H — > M C l 2 . 
By Proposition 7.12, this is equivalent to there being positive constants a, b > 0, 
fl||x|| w ^ || J x || ^2 ^ fc||x||#, i.e., 


10.6 Orthonormal Bases 


211 


3c > 0, -Ikll 2 < y \(e n , x)\ 2 ^ c||x|| 2 . 
c 

n 

Let S/fa,,) and L (/*) _1 ; then x i— >■ A Lx is a continuous functional, 

hence there is a unique vector e \ such that S^Lx = ( e x). 

(a) The two sets of vectors e n and e n are bi-orthogonal, that is, ( e m , e n ) — 5 mn . 

(b) J*L — I — L* J , so 

* = y. {e n ,x)e n = y {e n ,x)e n . 

n n 


Applications 

Frequency-Time Orthonormal Bases 

An improvement on the classical orthonormal bases for functions t I-)- f(t) in 
L 2 (R) are bases that give information in both ‘frequency’ and ‘time’ . In contrast, the 
Fourier coefficients, for example, only give information about the frequency content 
of the function. A large nth Fourier coefficient means that there is a substantial 
amount of the term e 2nint , somewhere in the function /(f) without indicating at all 
where. The aim of frequency-time bases is to have coefficients u m n that depend on 
two parameters n and m, one of which is a frequency index, the other a “time” index. 
The a m n coefficients, much like musical notes placed on a score, indicate how much 
of the frequency corresponding to n, is “played” at the time corresponding to m: 
they are able to track the change of frequency content of / with time. Of course, the 
reference to t as time is not of relevance here; t can represent any other varying real 
quantity. 

Windowed Fourier Bases ( Short Time Fourier Transform ): A basic way to achieve 
this is to define the basis functions by 

h m ,n(t ) := e 2mnt h(t — m), 

where h is a carefully chosen (real) window function, with \\h \\ L 2 = 1, such that 
h m n are orthonormal. The simplest choice of window function is h = l r _ i i other 

popular possibilities, such as the Hann window cos 2 ( 7 rf) (— ^ / t / J^) and the 
Gaussian c a e~ t2 ^ 2a ~ do not give orthonormal bases but are useful nonetheless. 

One can then obtain a picture of / spread out in time and frequency, called a 
spectrogram (Fig. 10.8), by plotting the coefficients | , /) | 2 (often letting m and 

n vary continuously in R and R + respectively to get a smooth picture). 

Note that the coefficients ( h m n , /) are really just (a m n ) = T{h(t — m)f(t)). 
So summing the coefficients in n, keeping the position m fixed, gives the windowed 
function: 
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time t 


Fig. 10.8 Spectrogram of a piano piece, showing clearly the duration, frequency, and harmonics 
of each note 


Y, a m^ int = hit- m) f{t) 


and similarly, when ]T ;;i h(t — m) = 1, 

f(n) = f e~ 2mnt ^h{t -m)f(t)dt = ^a mM 

J m m 

The greatest disadvantage of these bases is that the window ‘width’ is predetermined; 
it ought to be large enough to contain the low frequency oscillations, but then the 
time localization of the high frequencies is lost. The aim of the windowed Fourier 
basis is only achieved over a limited range of frequencies. To circumvent this, one 
can make the window width decrease with the frequency parameter n — this is the 
idea of wavelets. 

Wavelet Bases: The basis in this case consists of the following functions in L 2 [ 0, 1] 

:= T m S2"'tp(t ) = 2 n/2 1p(2 n t - 7 77 ), (777 , 77 € Z) 

where t/j and (j> are carefully chosen ‘mother’ and ‘father’ functions in L 2 (R). The 
function ^ serves both as a window (ideally with compact support) and an oscil- 
lation. The basis functions 7 p mn are then scaled and translated versions of ij). They 
have the advantage that the resolution in ‘time’ is better for higher frequencies than 
the windowed Fourier bases, and so require less coefficients to represent a func- 
tion to the same level of detail. One example is the classical Flaar basis, generated 
by ip(t) 1 [o, i] — l[i, 2 ] (prove orthogonality of Other wavelets, gener- 

ated by continuous functions, are more popular, e.g. Mexican-hat ((1 — t 2 )e~ r ^ 2 ), 
Gabor/Morlet (e 2m ^ r e _?2 / 2 , usually f — I ; Fig. 10.9). The analogue of the spec- 
trogram is the scalogram, which is a plot of the coefficients Wf{a, b) := ( i/j a ^ , /) 
where t p a ,b(t) = 

In a multi-resolution wavelet scheme, a subspace V* of the Hilbert space L 2 (R) 
is split recursively into low and high resolution parts as V n+ \ = V n © W n , where 

w„ = vf - n v n , 1 
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Fig. 10.9 Three wavelets: Haar, Mexican hat (with a translated and scaled version), and Morlet 
(real and imaginary parts) 


Vk = V k -i ® Wk - 1 = • • • = Vo © Wo ® Wi ® • ■ • © Wk-l 

If we suppose V n and W„ to be spanned by orthonormal bases { : m = 
0, . . . , N — 1 } and {ip m ,n ■ m = 1, . . . , N — 1 }, that are generated by scaling 


and translation from a “father” and “mother” wavelets <j> and ip respectively, then, 


by recursion, one need only ensure V\ = Vq ffi Wq = for this scheme to 

work. Therefore the requirements are that <p, ip e V] be orthonormal. For N even, 
the following “refinement equations” are sufficient. 


c p(x ) = dQ(p(2x) + a\(p{2x — 1) + • • ■ + ax/-\(p{2x — N + 1), 
ip(x) = ciN-[(p(2x ) — aN-2<p(2x — 1) + • • • — ao(p(2x — N + 1) 


2 i i 2 o 

a o + • • ■ + — 2 . 


Recall here that <p( 2x — m) = 2 l ^ 2 (p m , i(jc) has norm 1/V2, so M 2 = T m a l/2- 
For example, the Haar basis satisfies cp(t) = (p(2t) + cp(2t — 1), ip(t) = <p{2t) — 
<p(2t — 1). The Daubechies wavelet basis of order N is a multi-resolution scheme 
with an optimal choice of coefficients a, , in which the wavelet ip is taken to be of 
compact support and ‘smooth’ (more precisely, with N zero moments; see [27]). 

Solving Linear Equations 

Orthonormal expansions can be used to solve linear equations Tx = y, where x and 
y are elements of some (separable) Hilbert space, and T an operator on it. Given 
an orthonormal basis {e n }, the vectors x and y can be written in terms of it as 
x = and b = b n e n - Of these, the scalar coefficients a n := ( e n ,x ) 

are unknown and to be determined, but b n := (e n , y) can be calculated explicitly. 
Substituting into Tx — y we get 



Moreover the vectors T e n can also be expanded as T e n — y „ T m n e m for some num- 
bers 7„, „ = (e m , Te n ). So, comparing coefficients in the equation ^T n m a, ,T mM e m = 
Zm b me m , we find 
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Z- 


— bn 


This can be thought of as a matrix equation in l 2 with the matrix [T m n ] having a 
countable number of rows and columns: 


(T U T n . 

■A 

( a A 

?21 722 • 


0,2 

V : 

) 

\ ■■ ) 



It is precisely the equation Tx — y written in terms of the coefficients of T, x and y 
in the orthonormal basis e„ . Effectively, the problem has been transferred from one 
in H to one in i 2 , via the isomorphism ./://—> ( 2 . 

For practical purposes, one can truncate the matrix and vectors to yield a finite 
N x N matrix equation that can then be solved. This can be justified because the 
remainder terms of y and x, namely X^Ljv +1 b n e n , etc., converge to 0 as IV — > oo. 

For theoretical purposes, the method is useful if the orthonormal basis elements 
e„ are eigenvectors of T, that is, T e n = A„e„ . This makes the matrix of T diagonal. 


M 

b 2 

'■ 

The equation is easily solved, a n = b n /\ n , unless \ n — 0. If = 0 (i.e., Tx — 0 
has non-trivial solutions) there are no solutions of 0 a n = b n unless b n = 0, in which 
case the a n are arbitrary. Thus there will be a solution x if, and only if, b n vanishes 
whenever X n does, or equivalently, y _L ker T . Separating the vectors e m that satisfy 
Te m =0 from the rest, the complete solution is 




m: A m =0 


+ z 

n :\ n ^0 



where a m are arbitrary constants. The first series is a solution of the “homogeneous 
equation” Tx — 0, while the second series is a “particular solution” of Tx = y. 

For the case of the Hilbert space L?(A), with e n and b = f all functions, the 
particular solution can be rewritten as 


z 


bn 

^ n 


{e n , f) 
— e ' 



e n (s)e n (x) 


) 


f(s ) ds. 


The kernel G(x, s ) := e n (s)e n (x)/ A„ is called the Green’s function of the oper- 

ator T. 


10.6 Orthonormal Bases 


215 


Gaussian quadrature 

A central problem in numerical analysis is to find an approximation for the integral 
of a real function, in the form 

/ ~ fli/(xi)H F a N f(x N )=: </>(/), 

where , x, are fixed numbers; note that 0 is a functional acting on /. The familiar 
trapezoid rule and Simpson’s rule are of this type, where the x„ are equally spaced 
along [a,b]. The question arises as to whether we can do better by choosing x n in 
some other optimal way. 

Let e„(x) be real orthonormal polynomials of degree n in the space L 2 [a, b], 
obtained from 1, x, x 2 , ..., by the Gram-Schmidt process. By orthogonality, their 
integrals vanish since e„ = (1, e n ) = 0, except for /j’ eo = || 1 II z, 2 [ a ,fc] - Certainly, 
for 4>{e n ) to agree with the integral J’j’ e n for n = 1, . . . , N — 1, we must require 

/ e 0 (xi) ... e 0 (x N ) \ / v 

ei(xi) ... ei(x N ) ai 

\e N -i(xi) . . . e N -\{x N )j \ aN ' 

which can be solved for a n when x n are known. The main point of Gaussian quadrature 
is that if x n are chosen to be the N roots of the polynomial c,y(v) (assuming they lie 
in [a, b ]), we also get J ^ e„ = 0 = 4>(e n ) for n < 2 N — 1. 

For consider the division of any e := e m (1 ^ m 0 2 N — 1) by c,y, e = qe^ + r 
where q and r are real polynomials of degree at most A — 1 . Then, as eo is proportional 
to 1, and q e [I,*, . . . = fe 0 , . . . , e w _i]], 


/wn\ 

0 

V 0 ) 



0 = (1, e) = f qe N +r = (q, e N ) + (1, r) = (1, r). 


Hence r = XitL/ e k f° r some scalars bk, and by the choice of the coefficients a„, 
and eN(x n ) = 0, 


e(x n ) = q(x n )ejv (x„) + r(x„) = r(x„), 

N N N—l N b 

so 4>(e) = ^ a n e(x n ) = ^a n r(x n ) = ^ b k '^a n e k {x n ) = 0= e. 

n = 1 n = 1 ^=1 n = 1 a 


Thus the integral of any / = ]T (1 a n e n e L 2 [a, b ] agrees with 0(f) up to order 
n = IN — 1, 
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21V— 1 


y. a n <j)(e n ) « </>(/). 


ft 


The residual error can be made as small as needed by taking a larger N. 
For example, using the Legendre polynomials, (prove!) 



0.35/(— 0.86) + 0.65/(— 0.34) + 0.65/(0.34) + 0.35/(0.86). 


All this applies equally well for weighted L 2 (A) spaces; for example, using Laguerre 
polynomials, 


■oo 


f(x)e~ x dx 0.60/(0.32) + 0.36/(1.75) + 0.039/(4.5) + 0.00054/(9.4). 


In practice, the algorithm of choice of most mathematics software is currently the 
Gauss-Kronrod algorithm, which performs Gaussian quadrature but refines it adap- 
tively by taking more evaluation points if necessary. 

Signal Processing 

Sounds, images, and signals in general can be thought of as vectors in L 2 (R), L 2 (R 2 ), 
and Lr{A) respectively. They can thus be decomposed into orthonormal sums with 
all the advantages that that entails. Three applications are: 

(a) Storing only the “largest” coefficients a n := (e n , x) of an orthonormal expansion 
leads to a useful compressed form of the vector x. Compression ratios of about 
100 are quite typical. A close copy of x can easily be regenerated from these 
coefficients using x — a„e n . Although not identical to the original (because 
the small terms were omitted), it may be good enough for the purpose, especially 
since the smallest coefficients are usually unappreciated fine detail or noise. 

(b) A vector can be altered intentionally by manipulating its coefficients. For exam- 
ple, it can be improved by filtering out noise coefficients, or particular features 
in a function may be picked out, e.g. image contrast may be enhanced if certain 
coefficients are weighted more than others. 

(c) A vector may be matched with a database of other vectors, by taking the inner 
product with each of them, using Parseval’s identity (x, y) = a„ . That 
vector with the largest correlation (x,y) gives the best match and can be selected 
for further investigation. 

Consequently, the storage, transmission, rapid retrieval, and comparison of images 
and sounds have seen a tremendous change in the past two decades, in part feeding 
the growth not only of the internet and mobile phones, but also of new scientific tools. 
For example, speech-, handwriting-, and face-recognition software find phonemes, 
characters, and faces that best match the given input; an E.C.G./E.K.G. or E.E.G. 
signal may be compared to a database for the early detection of cardiac arrest or 
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epileptic fits; the U.S. F.B.I. performs more than 50,000 fingerprint matches daily, 
etc. 

To see one application in some detail, let us look at one popular image format — 
JPEG (1992 standard). Color images consist of an array of pixels, each digitized into 
three numbers (R, G, B) e [0, l] 3 representing the red, green, and blue content. 
In the JPEG algorithm, the three RGB color bytes for each pixel are usually first 
converted to brightness, excess red, and excess blue, 


Y := rR + gG + bB , 


C r := 
C b := 


1 

2 

1 

2 


1 

2 (g + b) 
1 

2 (r + g) 


( R-Y ), 


(B-Y), 


where r ~ 0.25, g ~ 0.65, b ~ 0. 1 are agreed-upon constants such that r+g+b — 1. 
This is done to avoid effects due to color-shifts and because the brightness picture 
carries most of the visible information; in fact the excess red/blue pixels are reduced 
in number by a factor of 4 because the eye is not sensitive to fine detail in pure color. 

The image is then split into 8x8 blocks, and each block is expanded with respect 
to the cosine basis cos(nn(x + j)/8) cos(7rm(y + ^)/8) (the cosine transform is 
preferred for positive functions in general because the first few coefficients are larger; 
however it is not so good for sharp lines). The resulting 64 coefficients for each block 
are discretized (by multiplying by a user-defined weight, and taking the integer part). 
Most are now zero, and the rest are squeezed further using the standard Huffman 
compression algorithm. This way, a 4Mpixel image, that normally requires 1 2 million 
bytes in raw formats, can easily be reduced a hundredfold in file-size without any 
visible loss of quality. JPEG 2000 uses wavelets instead but works in essentially the 
same way; MPEG is JPEG 1992 adapted to video. 

Similarly a 5 min CD-quality stereo sound clip, sampled at 44,000 times 16 bits a 
second, would normally need at least 52 Mbytes. It can be compressed to about 10 % 
of that by MP3, an algorithm that works in an analogous way as JPEG, but adapted 
to sound signals. 

Remarks 10.36 

1. The norm on matrices in B( C N , C M ) that comes from the inner product defined 
in Example 10.2(2) is not the same as that defined in Theorem 8.7 (but recall 
that all norms on finite-dimensional Banach spaces are equivalent). 

2. Re ( x , y) is a real-valued inner product (over the reals), but Tm(.>c, y) fails the 
last two axioms. 

3. A real inner product on the real vector space X can be uniquely extended to its 
complexification X + iX, by 


(xi + ix 2, VI + iyi) := ({x\,yi) + { x 2 , yi)) + yi) - <*2, yi»- 
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Thus an inner product on R iV can extend in several ways to M 2jV , but in only one 
way to C N . 

4. There is an interesting analogy between linear subspaces and logic: Think of sub- 
spaces as “statements”, with A =>• B meaning A c. B, and False, True, A and 
B, A OR B, NOT A, corresponding to 0, X, A fl B, A + B, and A 2 , respectively. 
What are the logical rules that correspond to Proposition 10.9? Are all classical 
logic rules true in this sense? 

5. The polarization identity states that a complex inner product (x , y) is a weighted 
average of lengths on a circle of radius ||x||, centered at y. It can be generalized 
further: if lo n = 1 (N > 2), then 


(•v-y>- . £.c"|y~^''.v| 2 . 

n = 1 

Even more generally, (x, y) = ^7 f s 1 ||y + zx\\ 2 dz. 

6. A normed space with a conjugate-linear “isomorphism” J : X —> X*, has a 
sesquilinear product (x,y) := (x*y + y*x)/2 (where x* := Jx). The additional 
property x*x — ||x|| 2 turns it into an inner product space, compatible with the 
norm of X. 

7. The conjugate gradient method is an iteration to solve T*Tx = y, used espe- 
cially when T is a very large matrix. Note that ((x, y)) := (x, T*Ty) is an inner 
product when T is 1-1. If e, were an orthonormal basis with respect to this inner 
product, and x = ajej, then 

otj = ((e ; -,x» = (ej, T*Tx) = { ej,y >, 

and x can be found. The iteration is essentially the Gram-Schmidt process applied 
to the residual vectors r n = y — T*Tx„, while calculating the approximate 
solutions x„ on the go, (|||x||| 2 := ((x, x))) 

■= y/lllTlll, e ' n+ 1 := r, 1 - {{e n , r n ))e n , 

e n + 1 e n+l/IH e n+lI’ 

xo = (e 0 , y)eo, x„+i := x n + {e n+ \,y)e n+ \, 
r 0 '.= y-T*Tx 0, r n+ \ := y — T*Tx n+ \. 

8. QR decomposition : any operator T : X —> Y between Hilbert spaces maps 
an orthonormal basis e; e X to a sequence of vectors 7'e,- e Y. If these are 
orthonormalized to e\ using the Gram-Schmidt process, then 7’e,- = X /=i a ij e 'j- 
This means that, with respect to the bases e, and eh , T has the upper-triangular 
matrix R. If Q represents the change of bases in Y from eh to the original one, 
then the matrix of T is QR. 
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9. A continuous function / : [0, 27 t] C, /( 0) = f(2n), traces out a looped path 

or ‘orbit’ in the complex plane. If the Fourier coefficients are written in polar 
form, it is clear that each term a n e lnf> = r n e l(nij+n " 1 describes a circle; and the 
sum of two terms describes the motion along a circle whose center also moves 
in a circle. The whole Fourier sum then represents a motion along regressively 
smaller circles. Ptolemy and other Greek astronomers were the first to describe 
a periodic motion in terms of these cycles within cycles. 

10. A non-separable Hilbert space is still isomorphic to an l 2 (A) space, one with 

an uncountable number of orthonormal basis vectors. For example every Hilbert 
space with an orthonormal basis { e t } where 1 e [0, 1] is isomorphic to the space 
t? 2 [0, 1] consisting of functions a, for which ||a|| 2 := |a r | 2 < oo (Note: a 

can take only a countable number of non-zero values.) 

11. The first important application of the least-squares method was by Gauss. In 
1801, G. Piazzi found the long-sought ‘missing’ planet between the orbits of 
Jupiter and Mars, but could not observe it again after it went behind the Sun. 
Gauss managed to recover its orbital parameters from Piazzi’s observations, 
and Ceres was relocated almost a year after its discovery. Essentially the same 
techniques were used in 1846 to predict the location of a new planet, Neptune, 
from the irregularities in the observed positions of Uranus. 

12. There is a discrete version of the Fourier basis, on L 2 [0, 1], called the Walsh 
basis , which consists of step functions. For each N — 1,2,..., there are 2 N 
Walsh basis functions, each with a step-width of 1 /2 N and the list of heights are 
the normalized column vectors of the Hadamard matrices. 


Chapter 11 

Banach Spaces 


In this chapter, we explore deeper into the properties of operators and functionals 
on general Banach spaces. At the same time, we generalize several definitions and 
propositions that hold for Hilbert spaces. As these spaces are, in many ways, very 
special and non-typical examples of Banach spaces, we need to modify these results 
in several technical ways: There are no orthonormal bases, or Riesz correspondence, 
or orthogonal projections available in Banach spaces. 


11.1 The Open Mapping Theorem 

The following theorem holds the key to several unanswered questions that were 
raised earlier. 

Theorem 11.1 The Open Mapping Theorem 


Every onto continuous linear map between Banach spaces maps open sets 
to open sets. 


Proof Let T : X — > Y be an onto operator between the Banach spaces X and Y. Let 
U be an open subset of X, and let x e U, so that x e B f (x ) C U. If it can be shown 
that T Bx contains a ball B$( 0), then 

Tx e Bg € (Tx) = Tx + eB$ (0) c Tx + e T Bx = TB € (x) c TU 

implies that TU is an open set in T, proving the theorem. 

Now X = U^j B„( 0), so TX = U~! T B n ( 0). But TX = Y is complete, so 
by Baire’s category theorem, not all the sets T B n (0) are nowhere dense: there must 
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be an N such that T Bx( 0) contains a ball. By re-scaling we find that T Bx contains 
a ball B r (a). It follows that for every v e B r ( 0) we have 

a + y = lim T x n , for some x„ e Bx , 

n—>oo 

a — y = lim T x' , for some x' e Bx , 

n^-oo 

y = lim T ( Xn ~ Xn ) e TB^ 
n->°° \ 2 ) 


since ||x„ — x' n \\ < 2. Consequently we have that B r ( 0) C T Bx- 

Claim: T Bx C T Bt,{0). Let y e T Bx, so that there must be an^o £ Bx such that 
|| v — Lxo || < r/2; that is, ||xo|| < 1 and y — Tx o e B,-/ 2 ( 0 ) c T B\/ 2 (Q)- But this 
implies that there is an x\ e B i / 2 (0) such that || v — Tx o — Tx\ || < r/4. Continuing 
in this fashion, we get a sequence x n such that 

1 r 

\\x n \\ < ^, \\y-T(xi4 hx„)|| < — - 

We can conclude that x := ]T (I x n converges absolutely, with ||x|| ^ X/^=o 2 ^ = 
and that y = Tx e 7'/i 2 | 0] c 77L +f (0). 

Re-scaling the vectors in B r (0) C T Bj (0) gives B,-/ 3 (0) c T Bx and closes the 
argument. □ 

Corollary 11.2 


Every bijective operator between Banach spaces is an isomorphism. 


With this fact, we are ready for the analogue of the first isomorphism theorem of 
vector spaces, which is a generalization of the corollary. 

Proposition 11.3 

For any operator T : X -»• Y between Banach spaces, 

X / ker T = im T <4- im T is closed in Y 

<4 3c > 0, Vx e X, || jc + ker T|| ^ c||rx||. 


Proof The mapping J : x + ker T i->- Tx is well-defined because T (x + a ) — Tx 
for any a e ker T . It is obviously onto im T, is 1-1 because 


Tx — 0 x e ker T x + ker T = ker T, 
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is trivially linear, and continuous since, for a n e ker T chosen to satisfy ||x + a n || — >- 
||x + ker 7||, 

117x11 = || T(x + a„)ll ^ || T || ||x + a n || — > || 7|| ||x + ker 7||. 

So J is an isomorphism precisely when J~ l is continuous, i.e., when the stated the 
inequality holds (Proposition 8.12). 

By the corollary to the open mapping theorem, this is the case if the range of 
J, namely im T, is complete (closed in Y by Proposition 4.7). For the converse, 
X / ker 7’ is complete (Proposition 8.18), as must be any isomorphic copy such as 
im T. □ 

Examples 11.4 

1. If T e B(X . Y ) is an operator on Banach spaces, and Y — im T © M for some 
closed linear subspace M of Y , then im T is closed in Y . 

Proof The mapping X / ker T — im T defined in the proof above can be extended 
to (X/ker T) x M — > Y by (x + ker T, a) i — > T x + a; it is continuous and 
bijective, hence an isomorphism. The conclusion follows since it sends the closed 
set (X/ ker T ) x { 0 } to im T . 

2. ► Let T : X — > Y be a linear map between Banach spaces; its graph M := 
{ (x, Tx) : x e X } is a linear subspace of X x T, and the map J : M —*■ X, 
defined by J(x, Tx) := x is 1-1, onto, linear, and continuous. 

Closed Graph Theorem: If M is also closed in X x Y, then it is a Banach subspace, 
and the open mapping theorem implies that J is an isomorphism, so that 

II T x || y < HO, Tx) \\ M ^ c||x|| x 

and T must be continuous. 

3. ► It is important that Y be complete for the open mapping theorem to be valid. 
The identity map ( 1 — > l°° is continuous and 1-1, but l 1 is not isomorphic 
to its image, because the latter is not complete (in the oo-norm). For example, 
xjy := (1, i, . . . , -jgr, 0, . . .) converge in the oo-norm, but not to an t ] -sequence. 

4. If X has two complete norms, and ||x|| ^ c|||x||| for some fixed c > 0, then 
the two norms are equivalent: the identity map X y | — > X|m is continuous by 
hypothesis, and obviously linear and bijective; so its inverse is also continuous. 
Put differently, if two complete norms on X are inequivalent, then one can find 
vectors x„ which are unit with respect to one norm, but growing indefinitely with 
respect to the other. Clearly, this can only happen in infinite dimensions. 
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Complementarity 

We are now in a position to answer an earlier question about projections: It is not 
always possible to project continuously to a closed subspace. The following propo- 
sition determines exactly when such a projection exists: 

Proposition 11.5 

There is a projection onto a closed linear subspace M of a Banach space 
X if, and only if, 

X = M®N 

for some closed linear subspace N. In this case M = im P, N = ker P, and 

M © N = M x N. 


We say that M, N are complementary closed subspaces. 

Proof The forward implication has already been proved (Example 8.16(3)). 

Conversely, suppose X = M ® N, so that any x — a + b for some a e M , b e N . 
Uniqueness of a, b follows from 

a\ + b\ = x = 02 + bi ^ a\ — ai — b 2 — b[ e M fl N — 0, 

=>■ a\ = 02 AND b\ — b 2 - 

This allows us to define the function P : X — > X by P (x) := a. It is linear since 


P(Xx i + X 2 ) = P(Xa 1 + Xb\ + 02 + bf) = Xa\ + 02 = XPx \ + Px 2 , 


When x belongs to M or N, we get the special cases 

Va e M, Pa = P{a + 0) = a; Vb e N, Pb = P( 0 + b) = 0, 

so im P = M and ker P = N, since any x e ker P satisfies 0 = Px = a implying 
x — b e N. 

P is a continuous projection : P 2 = P since, for any x = a + b e M ® N, 
P 2 x = Pa = a = Px. Finally, the map J : M x N — > X, J (a, b) := a + b, 
between Banach spaces, is 1-1, onto and continuous 

ll fl + b\\ x ^ IMIx + ll^llx = II («. b)\\ MxN 

and so is an isomorphism by the open mapping theorem. Therefore 
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Every subspace M can be extended by another subspace N such that X = M © N 
(by extending a basis for M to span X) but complementarity requires M, N to be 
closed. 

Examples 11.6 

1 . Finite-dimensional subspaces are always complemented. 

Proof The projection to M = Dei, ■ . . , e^y]] is simply x i->- S\(x)ei +••• + 
Sxr(x)eN, where S m are the dual basis for M* (S m (e„) := S nm ). Although 8 m are 
defined on M, they can be extended to X* as seen later (Theorem 11.17). 

2. Finite-codimensional closed subspaces are complemented. 

Proof Let e\ + M, . . . , e n + M be a basis for X/M, and let N :— |[ei , . . . , e„]] 
(complete). Then, for any x. 


n 


n 



which shows x — a e M, a e N, so x e M + N. If x e M fl A, then the above 
identity gives M = «, ( e, + M), so a,- = 0 (linear independence of e, + M) 

and x — 0. 

3. For Banach spaces, if T : X — »■ Y is onto, and X = ker T ® M then it follows 
that Y = X/ ker T = M is embedded in X. 

4. * A Banach theorem: If A is a separable Banach space, then there is an onto 
operator T '.l 1 -* X. 

(‘Proof’ Let.r„be dense in By, and let T : l l —> X be defined by T(e„) := x n , 
extended linearly. Then it follows easily that || T|| = I , so 7’ B t ] = B\ , and by a 
similar argument of the proof of the open mapping theorem, T B t i = By-) 
Hence, if X is not embedded in f 1 , then ker 7’ is not complemented (by the 
previous example). 

Exercises 11.7 

1. For a projection P : X X, X/imP = ker P, while ||x + ker / J || ^ ||.Px||. 

2. Second isomorphism theorem: If M, N , and M + N are closed subspaces of a 
Banach space, then (M + N)/N = M/MHN, using the map M — ► (M + N)/N, 
x \—r x + N. 

3. Third isomorphism theorem: Let M C. N be closed subspaces of X, then yjjM = 

jj using the map X/M —*■ X/N, x + M i->- x + N. If M is finite-codimensional 
then codim N ^ codim M. 

4. Let T : X — »■ Y and S : X —*■ Z be operators on Banach spaces. 

(a) If M is a closed linear subspace of ker T, then x + M i— >- Tx is well defined, 
linear, and continuous. 
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(b) If S is onto and Sx = 0 =>■ Tx = 0, then Sx i->- Tx is a well-defined 
operator in B(Z, Y ). 

5. * Suppose the Banach space X has a Schauder basis e n (of unit norm). For 
x = a n e„, it can be shown that |||x||| := sup (1 || X/=i a i e i\\ exists and is a 
complete norm. Show ||x|| ^ |||x||| and deduce that the map </>„ : x i->- a n is in 
X* . These functionals form a Schauder basis for X*, called the bi-orthogonal or 
dual basis, and satisfy </>„(e,„) = 8 nm . 

6. Let M, N be closed subspaces of a Banach space, with M IT IV = { 0 }. Then 
M + N is closed <£> P : M + N — »■ M, x + y i— ► x, is continuous. 

7. If (f> : X — »■ F is linear with ker (p closed, then </; is continuous. 

8. If M is a complemented closed subspace of X, then X = jg x M. 

9. IfX = M ®N with M , N closed, then there is a minimum separation \\u — u|| ^ c 
between any unit vectors u e M, v e N. 


11.2 Compact Operators 

A linear map is continuous when it maps bounded sets to bounded sets. There is a 
special subclass of linear maps that go further: 

Definition 11.8 


A linear mapping between Banach spaces is called compact when it maps 
bounded sets to totally bounded sets. 


Easy Consequences 

1 . Compact linear maps are continuous (originally called completely continuous). 

2. lfT,S are compact operators, then so are T + S and XT (since B bounded implies 
XT B and subsets of 7’ It + SB are totally bounded (Proposition 7.13)). 

3. The identity map I : X — > X is not compact when the Banach space is infinite 
dimensional (it cannot convert the unit ball to a totally bounded set (Proposition 
8.23)). 

4. It is enough to show that T maps the unit ball to a totally bounded set for T to be 
compact (since B C B r ( 0) => T B C rT Bx )■ 
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Proposition 11.9 

If T is compact and S continuous linear, then ST and TS are compact 
(when defined). 

If T n are compact and T n — 7’ then T is compact. 

For a compact operator T, im T is separable, and is closed only when 
finite-dimensional. 


Proof (i) Starting from a bounded set, T maps it to a totally bounded set and S, being 
Lipschitz, maps this to another totally bounded set (Proposition 6.7); or starting with 
a bounded set, S maps it to another bounded set (Exercise 4.17(3)), which is then 
mapped by T to a totally bounded set. 

(ii) Let B be a bounded set, with its vectors having norm at most c. Then for any 
x e B, Tx = T n x + (T — T n )x, and 

II (r - T n )x || || r - r„ || ||* || < c||r - T n \\ -> o. 

Hence for n large enough, independent of x e B, || (T — 7'„)x| < e/2; in other 
words (T — T n )B c If / 2 (0) . Moreover T n B is totally bounded and so, 

N N 

TB C T n B + (T — T„)B c \J B e/2 (*/) + B f/2 ( 0) = [j B 6 (*,■). 

1 = 1 1 = 1 

Thus TB is totally bounded and T is compact. 

(iii) Totally bounded sets are separable (Example 6.6(3)), so the image of T, 

OO OO 

im7 = TX = T [J B n (0) = \J TB n ( 0), 

12=1 12=1 


being the countable union of separable sets, is separable (Exercise 4.21(3)). 

If im T is complete, then it is a Banach space in its own right. The open mapping 
theorem can be used to conclude that the unit (open) ball Bx is mapped to an open 
and totally bounded set T Bx c im T . As 0 is an interior point of it, there is a 
totally bounded ball B r (0) fl im T C T Bx- This can only happen if im T is finite 
dimensional. □ 

Examples 11.10 

1 . An operator whose image has finite dimension {finite rank ) is compact. The reason 
is that, in a finite-dimensional space, bounded sets are necessarily totally bounded 
(Proposition 8.23, Exercise 6.9(5)). For example, matrices and functionals are 
compact operators of finite rank. 
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2. ► A common way of showing that an operator is compact is to show that it is the 
limit of operators of finite rank. 

For example let T : l 2 — »■ (? be defined by '/'(«„) := ( a n /n ). First cleave the 
operator to 7# defined by T^(a n ) := (a i / 1 , < 22 / 2 , . . . , qn/N, 0, 0, . . .). This 
operator maps l 2 to an A -dimensional space. Showing it is continuous would 
imply it is compact of finite rank: 

N N 

II TV (a«) II = 2ja„/«| 2 ^ ^Jfl«| 2 < II (a«)ll^ 2 - 

n=i n = 1 

Furthermore, 7’v —> T: 

OO j OO 

||(r — 7V)(a»)llf 2 = X K/«I 2 <t^ 2 Z l««l 2 < II («»)II^/A7 2 . 

n=N + 1 n=W+l 

Hence || 7’ — 7^v || ^ 1/A —> 0 as A — >■ 00 as required. 

N 

3. 7A/(x) := z f(n)e 2nmx is an example of an operator of finite rank on 

n=—N 

L l [ 0,1]. 

4. ► If T is a compact operator on Banach spaces and (x„) is bounded, then (Tx„) 
has a convergent subsequence. 

Proof The sequence (T x„) is totally bounded, hence has a Cauchy subsequence, 
which converges by virtue of the completeness of the codomain. 

An important source of examples of compact operators is the following: 

Proposition 11.11 


If the kernel k is a continuous function [a, b]x[c, d] —x C, then the integral 
operator T : C[a, b ] — »■ C[c, d], 

Tf(y) := [ k(x,y)f(x) dx 
J a 

is compact. 


Proof Let F be the unit ball of functions in C[a, b]. For any y e [c, d], and f e F, 
\Tf{y)\ <(fc-a)||*|| L «||/|| L « <(*-a)ll*ll L ». 
so (T 7 r )[c, d] is bounded in C. hence totally bounded. 



1 1 .2 Compact Operators 


229 


As k is continuous on the compact set [a, b] x [c, d], it is uniformly continuous 
(Proposition 6.17). So for any e > 0 there is a 8 > 0 such that for yi — >’2 < <5, 

\Tf(yi) - Tf(y 2 )\ < f \ k(x, yi) - k(x, y 2 )\\f(x)\ dx f e(b - a). 

J a 

This implies that T f is continuous and, as S is independent of /, T F is equicon- 
tinuous. By the Arzela-Ascoli theorem (Theorem 6.26), T F is totally bounded in 
C[c, d], and the integral operator T is compact. □ 


Fredholm Operators 
Definition 11.12 


A Fredholm operator is one whose kernel is finite-dimensional and whose 
image has finite codimension. The index of a Fredholm operator is the differ- 
ence 

index(T) := dim ker T — codim im T . 


A Fredholm operator T : X — »■ Y gives rise to decompositions 
X = ker T ® M, Y = im T © N, 

for some closed linear subspaces M, N by Examples 11.6(1, 2) and 11.4(1). The 
restricted operator R : M — > im T, x i->- Tx is then bijective and continuous, and 
thus an isomorphism by the open mapping theorem. 

Proposition 11.13 Index Theorem 


The composition of Fredholm operators is again Fredholm, and 

index(,S' 7~) = index(S') + index (7'). 


Proof Let T e B(X, Y ), S e B(Y, Z), both Fredholm, with n := dim(ker 7 ), 
m := codim(im S). Y decomposes as 

Y = N © im T — ker S®M — A®B®C®D 


where 


A := ker S fl N of dimension a. 
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B := im T fl ker S of dimension b, 
C := M fl N of dimension c, 

D := M fl im T . 
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Then dim ker ST = n + b, codim im ST = c + m, both finite, and the index of ST 
is n + b — c — m = (a + b — m) + (n — a — c) = index(S) + index(T). □ 

What is the connection with compact operators, one might ask? 

Proposition 11.14 


An operator T : X -> Y on Banach spaces is Fredholm, if and only if T 
is invertible “up to compact operators”, that is, there exist K\ e B(X), 
K 2 e B(Y) compact and S e B(Y , X), such that 

ST = I + Ky, TS=I + K 2 . 


In fact, K\, K 2 can be taken to be of hnite rank. 

Proof Suppose T is Fredholm, so X = ker T © M,Y = im T ® N , and the map 
R : M — ► im T is an isomorphism. Let P be the finite-rank projection onto ker T 
with kernel M, and Q that projection onto N along im T (,\ QT = 0 = T P), and 
let S := R~ l (I - Q )■ Then TR~ l = I and R~ l T = I - P, so for any x e X, 
y eY, 

STx = R~\l - Q)T x = R~ l Tx = (I - P)x, 

TSy = TR-\l-Q)y = {I-Q)y , 


so ST = I - P,TS = I - Q. 
Conversely, 


ker T C ker ST = ker (/ + Ky) =: M. 

For x e M, K\x = — x, i.e., / = —K\ \ m compact, hence M, and thus ker T, are 
finite-dimensional. Similarly, 

im T 3 im TS = im(7 + Kf ) . 

Now, in general, the operator R := I + K has a closed image, for any compact 
operator K . For suppose there are vectors y„ such that 


|| y„ + ker R || = 1 AND Ry n —>■ 0. 
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The first condition implies that there are vectors v n e ker R such that 1 ^ || u n || ^ 2, 
where u„ := y n — v n . As K is compact and u n bounded, there is a subsequence u m 
such that Ku m -» y (Example 11.10(4)). Subtracting this from Ru m = Ry m — >• 0 
gives u m -* y. Consequently, Ru m converges to both Ry and to 0, that is, y e ker R . 
However, y m — v m —> y then contradicts the given condition that y m are at unit 
distance from ker R. It must be the case that there is a constant c > 0 such that 

II v + ker I? || < c||,R;y||, 

and im R is closed (Proposition 11.3). 

It should be clear that the map Y — > Y j i m R defined by 

y i — > Ky + im R — (y + Ky) — y + im R = — y + im R 

is both compact and onto, hence it is of finite rank. This means that Y / im R, and by 
implication Y / im T, are finite-dimensional. □ 

Exercises 11.15 

1. The multiplication operator ( a n ) m- ( b n a n ) (on l 1 , £ 2 , or £°°) is compact 
b n 0. 

2. The operator V (a n ) := (0, ao, a\/2 , 02/3, . . .) (on i 1 , say) is compact. But the 
shift operators are not. 

3. The operator T x for any <p n e X*, y n e Y, is of finite rank. 

In the limit N — »■ oo it gives a compact operator if X;^=i II Qn II ||yn II < oo. 

In fact, any operator of finite rank must be of this type Tx = XhLi (0« x ) e « with 
4> n e X* and e n a basis for im T . 

4. If S, T are linear of finite rank, then so are XT and S + T: if S is any linear map, 
then ST and T S are of finite rank, when defined. 

5. No isomorphism between infinite-dimensional Banach spaces X, Y, can be com- 
pact. If T : X — > Y is compact and invertible, then T~ l cannot be continuous. 

6. If T : X — »■ Y is compact, then so is its restriction to a closed subset M C X, 

T\m : M -* Y . 

7. The index of an m x n matrix is n — m. 

8. The right-shift operator R (on £°° say) is Fredholm with index —1; that of the 
left-shift operator is +1. The index of a projection is 0 when defined. 

11.3 The Dual Space X* 

Functionals provide very useful tools in converting vectors to numbers, and vector 
sequences to more amenable numerical sequences. Thus if we are uncertain whether 
x n — > x then we might try to see if (j>x n cpx for some continuous functional — if it 
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does not converge, neither does x n . Moreover, X* is a sort of mirror-image, or dual, 
of X: every vector in X can be thought of as a linear operator x : F — > X, X m- Xx, 
while functionals are linear operators <p : X — > F, x i-> <px. It turns out that the 
space X* is at least as “rich” as the normed space X, in the sense that X can be 
recovered from X* as a subspace of X** . 

Examples 11.16 

1. The functionals of a Hilbert space are in 1-1 correspondence with the vectors by 
the Riesz representation theorem. 

2. Recall that l l * = £°°, l 2 * = l 2 , and Cq = £ ] (Propositions 9.6, 9.9, and Theo- 
rem 9.3). 

3. We will see later that every functional on B(<C N ) is of the type 4>T — tr(ST) 
where tr S is the trace of the matrix S (Theorems 15.31 and 10.16). 

4. (X x Y)* = X* x Y*, via the isomorphism (0,0) \-r co where co(x, y) := 
<px + 0y. 

5. For (pi, 0 e X*, 0 f|" = i ker0,- = 0 o 0 e [0i, . . . , 0,,]. 

Proof If 0ker 0 = 0, (0 0), then the map C -> C, 0(x) i->- 0(x) is well- 

defined and linear, hence must be multiplication by some scalar X, i.e., \[r = X(p. 
Suppose 0 nS 1 ker0,- = 0. On the space ker0„ + i, if 0,x = 0, i = 1 
then xj/x = 0, so by induction, \[r = ^” =1 a/0/. Let ^ := 0 — X/=t «i'0i ; then 
(pn+ix = 0 =?■ f x = 0, so f = a„ + i0„ + i as required. The converse is easy. 

Our first result concerning functionals is a powerful theorem which asserts the 
existence of a functional on the whole space X, starting from a “fragment” of it on 
a, perhaps much smaller, subspace Y . Like many existence-type theorems, the path 
to construct such an extension is not straightforward. 

Theorem 11.17 The Hahn-Banach Theorem 


Let Y be a subspace of a normed space X. Then every functional 0 e Y* 
can be extended to some 0 e X*, with ||0|| x * = Il0lly*- 


Proof Let us try to extend 0 from a functional on Y to a functional 0 on Y + [uj, 
for a vector v Y, by selecting a number 0 v := c. Once c is chosen, we are forced 
to set 0(y + Xv) := 0y + Xc , for any Xv e [uj, to make 0 a linear extension of 0; 
and to retain continuity with ||0|| = ||0||, we need, for any y e Y and X e F (X 0), 

|0y + A.c| = |0(y + ku)| < ||0|| ||y + Ml 
& I0(.vA) + c| < ||0||||yA + u 

o \(/>y + c\ < IMIly -Ml, 


(li.i) 
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(since the vectors y/X account for all of Y ). To proceed, we consider first the case 
of real scalars and then generalize to the complex field. 

Real Normed Space: Let us suppose that 0 is real-valued. Thus we are required 
to find a c e JR that satisfies inequality (11.1) 

~<t>y ~ 1101111)' + "11 ^ c < -0y + ||0|| || v + i'll, Vy e Y. 

Is this possible? Yes, because for any y | , y 2 e Y, 

0.vi - 0y 2 ^ |0(yi - y 2 )\ < II0III|}T - T 2 II 

< 11011 (llTi + "II + || y 2 + I'll) 

= II 0 II II 3h + I'll + II 011 II W + I'll 

+► -0.V2 - 11011 IIY 2 + i'll < -0yi + Il0lll|yi + "II- 

Since y\ , y 2 are arbitrary vectors in Y , there must be a constant c separating the two 
sides of the inequality, as sought. Choosing any such c gives an extended functional 
with ||0|| ^ ||(/)|| (inequality (11.1)); but (p extends </>, so ||0|| = ||0||. 

Complex Normed Space: Now consider the case when the the functional is 
complex-valued. It decomposes into its real and imaginary parts <p = <p\ + (0 2 , 
but the two are not independent of each other because 

0i (oO + '02(iy) = 0('» = i <Py = i0i (y) - 02 O') 

so that 02 (y) = — 0i (iy). Being real-valued, they cannot possibly belong to Y*, but 
they do qualify as functionals on Y when restricted to the real scalars, 

01 (vi + y 2 ) = Re(0(yi) + 0 (y 2 )) = 0i(yi) + 0i(y 2 ), 
cf>i (Xy) = R ec/)(Xy) = X(f>\(y), WX e K, 

|0 1 (y)| = |Re0yK|0y|<||0||||y|| 

(for <p 2 , substitute Re with Im). So they have real-valued extensions 0; to Y + Hu]] 
that are linear over the real scalars; actually, extending cp 1 to 0 1 automatically gives 
the extension for 0 2 . That is, define 0(v) := 0 1 (x) — 10 1 (ix). This is obviously linear 
over the real scalars since 0 1 is. It is also linear over the complex scalars because 

<p(ix ) = 0i (ix) — i<p i(— x) = i(—i(pi(ix) + 0 i(x)) = (0(x). 

Moreover it is continuous since, using the polar form 0x = |0x|e'+, 

I0x | = e~ ,6x 4>x = 4>(e~ ,ei x) = (pi(e~ l6x x) < ||0i||||x|| = ||0i||||x|| < ||0||||x||, 

so that || 0|| ^ || 0||; in fact, equality holds because the domain of 0 includes that 
of 0. 
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Extending to X: If X can be generated from Y and a countable number of vectors 
v n , then ([) can be extended in steps, first to some (p\ acting on Y + then to <j)j 
acting on Y + [iq]] + [tq]], etc. The final extension is then cpx := <p n x for x e X, 
whenever x e Y + On , . . . , v„ ]] . If these vectors are only dense in X (e.g. when X is 
separable), (p can be extended further with the same norm via (p(x) := lim,,-^ </>(x„) 
when x n -> x, as a special case of extending a linear continuous function to the 
completion spaces (Example 8.9(4)). 

But even if X needs an uncountable number of generating vectors, then “Haus- 
dorff’s maximality principle” can be applied to conclude that the extension goes 
through to X. Let Ai be the collection of functionals (pM acting on linear subspaces 
M containing Y and extending (p with the same norm 

M := { 4>m £ M* : Vy £ Y,cp M y = <py, AND \\(p M \\ = \\(p\\ }. 

By Hausdorff's maximality principle, Ai contains a maximal chain of subspaces 
{ M a }, where (p a extends (pp whenever M p C M a . But E := M a also allows 
an extension of <p, namely ip(x) := (p u x for x e M a . It is well-defined because 
x e M a fl Mp implies M a C Mp say, so (p a x — (ppx. It is linear and continuous 
with the same norm as (p , 


| \pX | = \<p a x\ ^ ||0a||||At|| = ||0||||X||. 

Hence \p is a maximal extension in A4; in fact, E — X, for were it to exclude 
any vector v, the first part of the proof assures us of an extension that includes v, 
contradicting the maximality of \p. □ 

Proposition 11.18 


For any i/O, there is a unit </> e X* with <px = ||x||. 

More generally, if M is a closed linear subspace and x ^ M, then there is 
a functional <p e X* with ||</>|| = 1, such that 

(pM = 0, (px / 0. 


Proof If x 0, there are non-zero functionals on [jcJ, such as i p(Xx) := Lc (c 0); 
in particular, to satisfy the requirement ||</>|| = 1, choose <p(Xx) := A.||jc||. By the 
Hahn-Banach theorem, it has an extension to all of X, with the same norm. 

More generally, given x M, form the linear subspace 

Y := |[x]] + M ={ Xx + a : X e C, a £ M }. 

Y* contains the functional defined by \p(Xx + a) := X ||x + M||. It is clearly zero 
when 1 = 0 and is linear and continuous since 



11.3 The Dual Space X 


235 


f(X\x + a\ + fiJ.2X + cii) = (Xi + [ik2)\\x + M || = f{X\x + ai) + fxffax + ai), 
|i jr{Xx + a) | = |7.| ||x + M\\ = \\Xx + M\\ ^ ||Xx + a|| 

and in fact ||i/r|| = 1, 

W(x + a„)|/||x + a„|| = ||x + M||/||x + a„|| ->• 1 

for a n e M chosen so that convergence of ||x+a„|| —*■ ||x + M|| occurs 

(Proposition 8.18). So 1 fr can be extended to a functional 4> on all of X with the 
same norm. □ 

The Hahn-Banach theorem and its corollaries show that there is a ready supply of 
functionals on normed spaces; admittedly, this does not sound exciting, but consider 
that there are vector spaces (not normed), such as L P (R ) with p < 1, that have 
only trivial continuous functionals. For our purposes, its greater importance lies in 
its ability to show a certain duality between X and its space of functionals X*. For 
example, the dual of the statement ||</>|| = sup \(j>x\ is 

11 * 11=1 

Proposition 11.19 


|x| = sup \(j)X , 

||71= sup \4>Tx\ 

11011 = 1 

11011=1 = 11*11 


Proof \cj)x\ f || jc || for all unit <p e X*. But the functional just constructed satisfies 
(px = || jc || and ||0|| = 1 , so sup|| 0|| =1 |0jc | = ||x||. 

This in turn allows us to deduce 


71 = sup HTTcH = sup sup \(pTx\. 
11 * 11=1 11 * 11 = 1 11011=1 


□ 


These identities generalize those for Hilbert spaces (Exercise 10.18(1)). 

Proposition 11.20 Separating Hyperplane Theorem 


If.r e I does not lie in the closed ball B r [0], then there is a ‘hyperplane ’ 
f~ l a which separates the two, that is, 

3</> e X *, 3 a el, Vy e B r [0], |</>y| < a < \(px\. 


Proof Let </> : [xj — > F, <p(Xx) := L||x||; its norm is 1 and fx = ||x|| > r. It can 
be extended to a functional on X with the same norm. Hence for any y in the closed 
ball, \fy\ ^ ||(/)|| ||y|| ^ r. The hyperplane is then Xx + ker</> where r/||x|| < X < 1. 

□ 
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Note that the proof remains valid when Z?,- [0] is replaced by a closed balanced 
convex set C since C + B € (0) determines a semi-norm in which it is the open unit 
ball (Exercise 7.7(8)). 

Examples 11.21 

1. The Hahn-Banach theorem and its corollaries are evident for Hilbert spaces: 

(a) Any functional 0 on a closed subspace M corresponds to a vector x e M, 
and hence has the obvious extension 0 := (x, ) on H . 

(b) x = 0 Vy e H, ( y , x) = 0. 

(c) Ik II = sup y7 , 0 = sup yV0 |^[. 

(d) One hyperplane separating x from B, [0] is x^ + ax, r < a < ||x||. 

2. Operators do not extend automatically as functionals do: 

(a) If M is a complemented closed subspace of X, then every operator 
T : M Y can be extended continuously to X — > Y. 

(b) If the identity map I on the closed subspace M can be extended to X — > M, 
then M is complemented in X. 

Proof Let A = Mffi N with M, N closed subspaces, and define T (a + b) := Ta 
for a e M, b e A. Then \\f(a + b)\\ = ||ra|| c||r||||a + £>|| (Proposition 

11.5). 

If I : X — > M is an extension of I : M M, then I 2 x — IIx = lx, so it is a 
projection in B(X). X then splits up as ker I © im I, where im I — M. 

3. If A is not separable then neither is X*. 

Proof Assume X* separable, with 0 1 , 02, . . . dense in it. By definition of their 
norm, there must be (unit) vectors x n such that for a fixed e > 0, 

I0n*nl > ( 110 / 1 II - 0||x„||. 


The claim is that M := |[a'„]] is equal to X, making X separable. For if not, then 
there is a unit functional 0 e X* such that 0 M = 0; and there is a 0„ close to it, 
||0 - 0„ || < e, so 


I0n*nl = 1(0 - 0n)^nl < 110 ~ 0nllll^nll < e||x„||. 

Combining the two inequalities yields ||0„ || < 2e, and this contradicts that 0„ is 
within e of the unit functional 0 . 

4. Banach Limits. The functional Lim on c (Exercise 9.4(1)) can be extended (non- 
uniquely) to a functional on £°°. 
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Annihilators 

Let us explore the duality between X and X * more closely. The connection between 
the two is the following construction, which allows us to shuttle between subspaces of 
X and those of X*. It is the generalization of the orthogonal spaces in Hilbert spaces 
which, under the Riesz correspondence /, can be rewritten in terms of functionals, 

A 1 - — { x e H : (x, a) = 0, VaeA} > { f e H* : <pa = 0, Va e A). 

Definition 11.22 

The annihilator of a set of vectors A C X is the set of functionals 
A x := { (p e X* : fix = 0, Vx e A }. 

Similarly, given a set of functionals <f> C X* then the pre-annihilator is 

i O:=)reX:# = 0, V0 e < t>}. 

Easy Consequences 

1. O' 1 = X*, Af- 1 = 0. 

2. ACfi^ikA 1 

3. Ac - l O <£> <t>A = 0 <t> c A- 1 . 

The properties of A 1 - generalize those for Hilbert spaces, such as Proposition 10.9 
and Exercise 10.14(4). 

Proposition 11.23 

A- 1 is a closed linear subspace of X* with the following properties: 

(i) (A U B) 1 - = A^S 1 and A 1 + fi 1 C (A (1 B)^, 

(ii) J -(A- L ) = fAJ, 

(iii) HAJ is dense in X A- 1 = 0. 

Proof That 4 1 is a linear subspace is evident from 


V0, f e A^~, aeA,ieF (</> + f)a — 4>a + fa = 0, (Xf)a = Xfa = 0. 

Let f n — > <p with 4> n e A- 1 ; for any a e A, 0 = <p n a —>■ fa, so f e A 1 - and A 1 - is 
closed. 
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(i) Clearly, (A U B )-*- is a subset of A 1 - and B^~, while <pA = 0 = <pB imply 
<j>(AU B) = 0. If 0 e A i jr e B^~, and x e Ad B, then (</> + \p)x = cpx + x[rx = 0. 

(ii) _l (A j -) is a closed linear subspace of X (Ex. 2 below), and it contains A, since for 
a e A and any (p e A^~, <pa = 0, so a e - L (A- L ). Thus [AJ C - L (A- L ) (Proposition 
7.10). 

Conversely, let x f. [A]. Then by Proposition 11.18, there is a functional (p 
satisfying both 0 E AJ = 0, hence (p e A- 1 , and <px ^ 0, hence x £ J -(A~ L ). 

(iii) Consequently, HAJ is dense precisely when ^(A- 1 -) = HAJ = X, and this is 

equivalent to A 1 - = 0 ('ix e X, cpx — 0 <p = 0). □ 


The Double Dual X** 

A functional (p is an assignment of numbers <px as the vectors x vary in X. Suppose 
we fix x and vary (p instead, cp \-+ cpx, what kind of object do we get? It is a mapping 
from X* to F, which is a possible candidate for a “double” functional in X**. 

Proposition 11.24 


For any x e X, the map x**cp := cpx is a functional on X*, and v i-> x** is 
a linear isometry, embedding X in X**. 


Proof The mapping x** : X* — »■ F, (p m - <px, is clearly linear in <p, and continuous 
with \x**(p\ = \<px\ < ||x|| || 0 1 | , i.e., x** e X**. 

Hence we can form the map J : X — > X **, defined by J (x) := x**. It is linear, 
since for any (p e X*, x, y e X, X e F, 

(x + y)**(<p) = cp(x + y) = <px + <Py = x**(<P) + y**«P), 

(A .x)**((p) = (pfkx) = X (px = Xx**((p). 

J is isometric by Proposition 1 1.19, ||x**|| = sup \x**<p\ = sup |0 jc | = ||x||. 

11011 = 1 11011 = 1 

□ 


Examples 11.25 

1. Given any normed space X, the double dual X** is a Banach space. Hence the 
closure JX, being a closed linear subspace of X**, is itself a Banach space. It is 
isomorphic to the completion of X, denoted by X. 

2. Several Banach spaces, called reflexive spaces, have the property that the mapping 
x m>- x** is an isomorphism. Examples include i p ( p > 1) and all Hilbert spaces 
(Proposition 10.16). 
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3. But in general, X need not be isomorphic to X**, even if X is complete. For 
example, some elements of (•£')** are not of the type x** for any ref 1 . 

4. In this embedding, A C A 2 - 1 - (since for x e A and </> e A 2 -, x**(p = (px — 0, so 
x ** e A- 1 - 1 ). Note that A- 1 - 1 is always a closed linear subspace even if A isn’t. 
Question: if M is a closed linear subspace is it necessarily true that M = M ' ~ L ? 

5. Since a functional is determined by its values on the unit sphere, we can think of 
the double-functional x** as a continuous function on the unit sphere in X*; its 
norm is none other than its maximum value there, ||x**|| = sup||0|| =1 \<px\. Hence 
the vectors of any normed space can be thought of as continuous functions on a 
(possibly infinite-dimensional) sphere! 

Exercises 11.26 

1. X* distinguishes points: If x ^ y then there is a (p e X* such that cpx ^ <py. 

2. If x ^ |J v ]] , find a functional on X with <f>x = 1 and <py = 0. 

3. For normed spaces, X* = 0 X = 0. 

4. Show that the functional <px := x, x e R, has many equal-norm extensions to M 2 
with the 1 -norm. 

5. The set { (p e X* : cpx — ||x|| } (for a given x) is a non-empty convex subset of 
X* (called the set of “tangent functionals” at x) 

6. Show that if { x }-*- = X* then x = 0, and if { x j 2 - = 0 then X = F or X = 0. 

7. Show -*-<!> is a closed linear subspace of X. 

8. ( J “ t t > )‘ L need not equal [[<!>]. For example, take O := { S n : n e N } in i 1 *. 

9. Let M be a closed subspace of a normed space X. The following maps are iso- 
morphisms 


M 1 - (X/M)* 

(p \[r 

i /r (x + M) := <px, 


X*/M l M* 

<p + M 2 - i y 4 >\m- 


Hence, dim M L = codim M and codim M ' = dim M, when finite. 


11.4 The Adjoint T T 

Recall the adjoint of an operator on Hilbert spaces T* : Y — > X defined by the 
identity (T*y,x) = (y, Tx). Is there an analogous definition that can be applied 
to Banach spaces? First, one needs to recast the defining relation, replacing inner 
products by functionals, ( T*y)*x = y*Tx. Although not exactly the same thing, 
the definition ( T J (p)x := (pTx captures the essentials of this identity in terms of 
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functionals. The relation between them is f* : v T J y* = a* i— >■ a. 

More formally, using the Riesz correspondences Jy : Y — > Y* and Jx '■ X — > X*, 
T* := J~ 1 T t J y . 

X 


X * Y 

T 

T* is sometimes called the Hilbert adjoint to distinguish it from the adjoint T J . 

Definition 11.27 



The adjoint 1 of an operator T : X — » Y is T T : Y* — > X* defined by 
( T J (f>)x := 4>{Tx) for any (p e Y* and x e X. 


That T T (p : X — »■ F is linear and continuous can be seen from 

T t <P(x + y ) = <t>T{x + >■) = <t>Tx + (j,Ty = T T 0(x) + 7’ T 0(y) 

T T 4>(k x) = (j>T{ Xx) = XcpTx = XT J (p(x) 

\(T J cf>)x\ = \<j){Tx)\ < \m\Tx\\ < ||0||||T||||x||. (11.2) 


Proposition 11.28 

T J is linear and continuous when T is, and the map T i->- T J is a linear 
isometry from B(X, Y) into B(Y*, X*), 

(S + t) t = s T + t t , (xt) t = xt t , ||r T || = ||r||. 

When defined, ( ST) J = T t S t . 


1 There is no standard name or notation for the adjoint operator. It has also been called the dual or 
transpose and denoted by various symbols such as 7 ', T* , T x , T ^ etc. 
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Proof Linearity of T J : For all x e X, </>, f e Y* , X e F, 

T T (f + i/OCO — ( <j> + — <pT x + fTx = ( T J (p)x + (T J ij/)x, 

T J (X<p)x = X<pTx = (XT J <p)x. 

That T J is continuous follows from ||7’ T (/>|| ||T||||0|| by (11.2). 

The other assertions are implied by the following statements, true for all x e X 
and all (p e Y*: 

(S + T) T (px = <p(Sx + T x) — (pSx + <pTx = ( S T (f> + T T <p)x, 

(XTffx = <f>(XT x) = XfTx = (X T J (f>)x. 

Using Proposition 11.19, 

|| 7" || = sup sup \4>T x\ = sup sup \(T T (f>)x\ = sup ||T T </>|| = ||r T ||. 

IF 11=1 11011=1 11011=1 IF 11=1 11011=1 

Finally, when T e B(X, Y), S e B(Y , Z), and any i jr e Z*, 

(ST) J f = if ST = (S T \f)T = T t S t f. 


□ 


Examples 11.29 

1. o T = o, r = i. 

2. The adjoint of a (complex) matrix is its transpose, with the columns becoming 
the rows, <pT x = y ■ T x = (T T y) ■ x, e.g. 



and generally, X/ Yi Z ./ T ij*j = Z j (Z / T ijyi)xj, so Tj = T u . 

3. ► To find the adjoint of an operator T on the sequence spaces l 1 , €~ , or co, the 
effect of T on a vector x needs to reevaluated as an effect on a functional (j), which 
recall is associated to a vector y in the dual spaces £°°, l 2 , or £ 1 , respectively, 


fTx = y ■ Tx = (T J y) ■ x. 


For example, to show that the adjoint of the operator T ( a „ ) := (a i, 0, 0, . . .) in 
B(t x ) is T T (b n ) = (0, ho, 0, . . .) in B(£°°), consider 

y ■ Tx = (h 0 , hi , b 2 , . . .) • {a \ , 0, 0, . . .) 

= hofli = ( 0 , ho, 0 , . . .) • (ao, fli, ai, . . .). 


242 


1 1 Banach Spaces 


4. ► The adjoint of the left-shift operator is the right-shift operator, on l 1 or co: 

oo 

(j)Lx = y ■ Lx = y b n a n+ i = (0, b 0 , b u ...)- (a 0 , a u a 2 , ■■■)= (Ry) • x. 

n = 0 

5. The adjoint of the Fourier transform T : L 1 [0, 1] -> co is J- T : i 1 — > L°°[0, 1] 
defined by PF T (a n ) = a„e~ 27Tmx . (Compare with PF* Exercise 10.35(14)) 
Proof For y = (a n ) e l 1 , 

y ■ Ff = / e~ 2nmx fix) d-r 

= / ( ^ a«e~ 27riM ) /(x) dx = (^ T y) • / 

with the placement of the sum in the integral justified by a„e~ 27T,nx e 
L°°[0, 1], 

* Note that £' C co, so the composition T J T is not defined on all of L 1 10, 1], 
i.e., rebuilding an L 1 -function from its Fourier coefficients is not guaranteed to 
converge uniformly back to the function. However, with this machinery in place, 
it is now easy to prove part of Dirichlet’s assertion for periodic functions: 

: C 2 [0, 1] -* c 2 (Z) C i l L°°[0, 1] (Exercise 9.27(3)). 

6. * Even if the codomain of T : X — > Y is reduced to a linear subspace M such 
that im T C M C Y, the image of T J remains the same. 

Proof Let T : X -> M , Tx Tx, be the new operator; then T T : M* — ► X*. 
Any functional <j> e M* can be extended to f e Y*, and for all x e X 

( T t 4>)x = fTx = (pTx = ( T J (p)x . 

Hence im T J c im T 1 . Conversely, any (p e Y* can be restricted to M, and the 
same reasoning shows the opposite inclusion. 

7. For a Hilbert space H, every operator T e B(H ) is paired up with its adjoint 
T* e B(H). This fact makes B(H) much more special than spaces of operators 
on Banach spaces, as we shall see later in Chapter 16 on C*-algebras. 

The Hilbert space fact ker T* = (im T ) J generalizes to Banach spaces, but the 
closure of im T 1 is not always (ker T) 2 - . 
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Proposition 11.30 (Closed Range Theorem) 


If X, Y are Banach spaces and T e B(X,Y), then 

ker T t = (im T) -1 , ker T = ^ im T J , 
im7' = 1 ker T J , im T J C (ker T 

Moreover, im T T = (ker T) A ~ 4? im T is closed im T T is closed. 


Proof The central statement is, for T e B{X, Y), 

<pT x = (T J (f>)x. 

If these quantities vanish for all x e X, then the two sides of the equation state 
<p e (im 7') ± and (p e ker T r , which must therefore be logically equivalent. If they 
vanish for all </> e Y* , then they state x e ker T and x e im T respectively. 

We have already seen that $ C A 1 ,4 C 1 <f>; so the statements in the second 
line of the proposition follow from the identities in the top line, using first <I> = 
ker T T , A = im T, and secondly A = ker 7' , <f> = i m 7’ . Moreover (Proposition 
11.23), 


im T = J_ (im r x ) = ^(ker T J ). 

im T closed im T closed : Suppose im 7’ is 
closed. We show that equality holds in im T J C (ker 7’) 1 . 
Let (p e (ker T )-*-, i.e., Tx = 0 => <px = 0. T can be 
considered as an onto operator T : X — > im T, so the 
mapping (p : Tx i->- <px is a well-defined functional on 
im T (Exercise 1 1 .7(4)). It can be extended to a functional 
f e Y* by the Hahn-Banach theorem. 

Then, for all x e X, 


X 


im T 


(px = (pT x — i pTx = (T J \p)x, 


so (p = T J \p and im T 1 is equal to the closed subspace (ker T )- L . 

Conversely, let im T J be closed, and define T : X -> im T =\ M, Tx := T x\ 
by Example 1 1 .29(6) above and the fact that the annihilator of im 7’ in M is 0, it 
follows that T T is 1-1 and has the closed image im T J . Hence, for all (p e M* , 
\\T T (p\\ ^ c||</>|| (Proposition 11.3). Now C := TBx is a closed balanced convex 
subset of Y, so by the separating hyperplane theorem, any y f C can be separated 
from it by means of a functional i p e Y*, 
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Vx e B i [0] , \fTx\ ^ r < \ijry\. 

Note that ||7’ T i/ f ll ^ >'■ Then 

r<Wy\ < UWM < -||T T ^|||bll < -||y|| 

c c 


and || v || > c. This implies that T Bx contains the ball B ,(()). But we have already 
seen in the proof of the open mapping theorem that when this is the case, then T Bx 
contains some open ball B f (0) of M. This can only be true if 7 is onto, that is, 
im7 = im T is equal to the closed space M. □ 

Proposition 11.31 Schauder’s Theorem 


If T is compact then so is its adjoint T J . 


Proof Let T : X — > Y be a compact operator, so T Bx is totally bounded in Y , that 
is, for arbitrarily small e > 0, it can be covered by a finite number of balls B € (Tx ;) 
where x\, . . . , xn € Bx- We want to show that 7' T maps the unit ball of functionals 
By * C Y* to a totally bounded set of functionals in X*. 

The linear map S : Y* -* defined by Sf (frTx \, . . . , \jrTxx) is continu- 
ous (because T is, and N is finite), so compact of finite-rank. Hence S By is totally 
bounded in F v and can be covered by balls B f (Sf j) for a finite number of i/// e By*. 


Bx 

T 

TB X 

Xj 


Txi 


T T By- 

T t 

By. 



j 
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We now show that balls of radius 4e centered at T T \jsj cover T By*. For any 
t/r e By * and any x e By, there are 7’x; and Si/zj close to T x and Si// respectively, 
resulting in 


\ijfTx — i/fjTx\ ^ \\//Tx — i/sTxil + \irTxi — \/fjTxi \ + \ifrjTxi — x[rjTx\ 

< \\il/\\\\Tx - Txi\\ + II S\jr - S\J/j || F „ + || fj || || Txi -Tx\ 

< H\\e + e+ Hj\\e 

< 3e 


So \\T J i/s - T T xl/j\\ < 3e, and T J By* c jj,- B 4€ (T J \lfj). □ 

Exercises 11.32 

1. The adjoint of a multiplier operator M y (x) := yx, where M y e Bit} ), is 
M y e B(i°°). 

2. The adjoint of a finite-rank operator Tx := X«=i F/>x)e„ is another finite-rank 
operator T t x// = X^=i (ire n )<p n . 

3. The adjoint is continuous: If T n -» T then 7j^ — > T J . 

4. T maps a linear subspace M onto T M ; show 7’ maps (T M )-*- into /V7 . So, if 
M is T -invariant, i.e., T M c M, then M 1 is r T -invariant. 

5. * In the embedding of X in X**, show that 7’ 1 : X** -» T** is an extension 
of T : X -> 7 in the sense that T tt x** = (Tx)**. 

6. T ' is 1-1 im T is dense in Y; and im T J is dense in X* =>■ T is 1-1. 

7. Let T e B(Z, T), 

T is an isomorphism 44- r T is an isomorphism, with (T T ) _1 = (T" 1 ) 1 ^, 

T and T J are onto. 

8. A necessary condition for the equation Tx = y to have a solution in x is that y 
have the property T J (f> — 0 =>■ <j>y = 0. When is it also sufficient? 

9. If P is a projection, then so is P T , with kernel (im P) 1 - and image (ker P)^ . 
Deduce 

X = M®N X* = N-*- ® M x . 

10. If T is a Fredholm operator (Definition 11.12), then so is T and its index 
is index(7’ T ) = — index(T). Moreover, index(r) = dim ker T — dim ker T t . 
(Hint: Exercise 11.26(9).) 
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11.5 Pointwise and Weak Convergence 

We have already encountered two types of convergence for operators T n e B( X, Y), 
to which can be added yet another, weaker, type: 

(i) Convergence in norm 

r„-»r <*. || r„ - t\\ o, 

(ii) Strong, or pointwise, convergence 

T n x — > Tx Vx e X || T n x — T x\\ Y — > 0 Vx e X. 

(iii) Weak convergence 

T n -^T <pT n x (j>T x Vx ex, V</> e Y*. 


Examples 11.33 

1 . ► Convergence in norm is “stronger” than pointwise convergence, since for each 
x e X, 

II T n x - Tx\\ = 11(7; - r)x|| <|| Tn- r ||||x II -* o. 

But the converse is false: it is possible to have pointwise convergence without 
convergence in norm. For example, let 8^ : l 1 — > C he defined by 8\:(a II ) := c/.y ; 
then 8^x -* 0 as N — > oo for each x e l 1 , but || II = 1. 

Similarly, when defined on c, 8 n converge pointwise to Lim, since r5 ; y («„ ) = 
on — > limH^ooflH yet -/> Lim (because 8n = e J N can converge only if cy 
converge in l 1 ). 

2. Another example is the projection operator defined as n left shifts followed by 

n right shifts, T n := R"L" : l ] — > f 1 . It converges pointwise to the 0 operator, 
since for each x = (a,-) e f 1 , ||/?"L"jc|| = l a 'l 0- However there are 

sequences, such as x := e n , for which T n x = x, so that || T n || = 1 0. 

3. If T n converge pointwise, T n x — > Tx, Vx, it does not follow that 7]J converge 
pointwise, (p — >■ T J cp, Vr/j. For example, in £ ] , L"x -» 0 for the left-shift 
operator L \ but R n x yV 0 in i°° . Another example is T n (aj ) := (a n , 0, 0, . . .). 

It often happens that a map is defined as the pointwise limit of a sequence of 
operators, T (x) := lim,,-^ T n x, assuming this is defined for all x e X. It is then 
natural to ask what properties does T enjoy: That it is linear is easy to prove, but 
is it also necessarily continuous? The answer is yes when A is a Banach space, as 
follows from the following stronger assertion: 


1 1 .5 Pointwise and Weak Convergence 


247 


Theorem 11.34 Banach-Steinhaus’s Uniform Bounded theorem 


For a Banach space X and 7} e li(X, Y), 

(Vx e X, 3 C x > 0, Vi, \\Tix\\ ^ C x ) => Vi, || 7} || s; C. 


(The index set of i need not be countable.) 

Proof The sets A c := {x e X : Vi, || fx\\ ^ c} are closed, since if x m e A c 
and x m —> x, then taking the limit m -» oo in the inequality || TjX m || ^ c, we find 
||7}x|| ^ c, by continuity of T, and the norm, showing x e A c . 

The given hypothesis is that X = Ui£i ^k- By Baire’s category theorem, not all 
these sets can be nowhere dense. That is, there must be at least one N for which A ,y 
contains a ball B r (a), in fact B r [ci] since A v is closed, 

Vyel, ||y-a|K r =► Vi, ||7-y|| ^ N. 


Thus 

Hy-all^r =>• ||7;(y - a)|| ^ ||7fy|| + ||7)a|| < 2N , 

which can be rewritten as || 1| < 1 =>■ ||7} (=^)|| < 2N/r. But every vector x 

can be written in this form uTr = for a suitable v, so for all i, 

Wl r 

2 N 

\\Tjx\\ < — ||x||. □ 

r 

Corollary 11.35 

If T n e B(X, Y) with X a Banach space, and T n x — > T (x) for all x, then T 
is linear and continuous, 

|| 7" || ^ liminf ||7’„||. 

n 


Proof T is necessarily linear, by continuity of addition and scalar multiplication 
(see the proof of Theorem 8.7). Any convergent sequence is bounded, so ||T„x|| is 
bounded for each x, from which follows that Vn, || T n || C, by the uniform bounded 
theorem. 

If we now choose a subsequence of T n , for which || T n || — > a := liminf„ ||r„||, 
and take the limit n — »■ ooof||r„x|| ^ ||7’„||||x||,weget||7’x|| ^ a||x|| and ||7’|| ^ a. 

□ 
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Examples 11.36 

1. The uniform bounded theorem can be restated as: If || T n || -> oo then there must 
be a vector x such that || r„x|| — > oo. 

* In fact the set of such x is dense in X: If any A ^ of the proof contains a ball, the 
conclusion of the theorem would hold; so with all A * nowhere dense in X, the 
complement of Ak is dense (Remark 4.22(1)). 

2. A common error is to define or prove Tx — T n x for all x and then deduce 
T = y „ T n . It is true that two functions are the same, / = g, when /(x) = g(x) 
for all x e X, but the point is that the meaning of the limit in the sum '^ jn differs 
in the two expressions, the first occurring in Y and the second in B(X , Y ). 

3. * Let S N f := XL-n f(n)e 27linx , where x e [0, 1] is hxed and / e C[0, 1], 
Show (a) Sn is a functional on C[0, 1], (b) .S'v / = /Lv * /, where 


N 

D N {x) := Y, 
n=—N 


sin(2 N + l)7rx 
sin7rx 


is called the Dirichlet kernel, and (c) || Sn || = || Dn || l i . Assuming one can show 
that f 0 | D„(x) \ dx -> oo, use the uniform boundedness theorem to deduce that 
there is a dense set of continuous functions / for which the Fourier series does 
not converge at x, Sn f — > oo in C[0, 1], 


Weak Convergence 

Let us now consider weak convergence of operators 

T n T <$■ (pT n x -* (pT x Vx e X, Vcp e Y*. 

For vectors (considered as operators F — > X, X !->■ ax), weak convergence takes the 
form 

X/j — v x (px n -> 4>x, V0 e X* . 

For functionals ( X —> F), this convergence is called weak* convergence and coin- 
cides with their pointwise convergence, 

(p n — ^ (p 44" (p n x — »■ (px , Vx e X. 

One must guard against a possible source of confusion: the weak convergence of 
functionals, when thought of as vectors in X*, is different 


(p n — T <P y<pn -> v<t>, e X**, 
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hence the need for a new name. 

Examples 11.37 

1. Strong convergence implies weak convergence because, by continuity of f, 

T n x — >■ Tx =y (pT n x —> <pTx. 

2. ► But the converse is false in general: For example, in c o, R n — ^ 0, since for any 
x = (a,) e co, andy = (bf) e l l = Cq, 

oo oo 

13- R n x\ = IZ bi+n&i | ^ z |£>,|||x|| — > Oasn — >■ oo, 

i =0 i =n 

yet R"x A 0, 


||1?' ! JC|| = ||(0 0, a 0 , a\, . . .)|| co = ||x|| A 0. 

3. To prove weak convergence, x n — ^ x, given that (x„) is bounded in A, itis enough 
to check i/rx„ — » i jrx for xf in a dense subset of A*. 

Proof Any </> e X* can be approximated by functionals \l/ n — >■ (]>, by their density 
in X*. For y n x„ — x (bounded), it is not hard to show that xl/ n y n —> 0, so 

<py n = f n y n + (</>- tn)y n ->■ o as n -* oo. 

4. Weak convergence of vectors and operators in an inner product space become 

x„ — ^ x 44- {y, x n ) — > {y, x) as n — > oo, Vy e A, 

T„ — ^ r <£> (y, r„x) — >■ (y, Tx) asn — >• oo, Vx, y e A. 

5. In an inner product space, 

x n — v x AND ||x n || — »■ ||x || x„ — » x. 

Proof When x„ — ^ x, we get (x, x„) (x, x) since x* is a functional, so 

||x — x„|| 2 = ||x || 2 — 2Re (x, x„) + ||x„|| 2 — > 0. 


Proposition 11.38 


In finite dimensions, all three convergence types are equivalent. 
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Proof Let A n — ^ A where A n , A are M x N matrices. This means that for any 
<p e (F m )* and x e F^, cf>(A n — A)x — > 0 as n — > oo. In particular if we let 
<f> — ej , x = e i be basis vectors for IK iW * and F' v respectively, then each component 
of A n converges to the corresponding component in A: 

A n jj = ej A n e j ejAej = Ajj, asn -> oo. 

This then implies that ||A„ — A|| ^ I A n jj — A; ; -| 2 — > 0 (Example 

8.9(2)). □ 

The analogous result of the Banach-Steinhaus theorem for weak convergence is 
also true, but more care is needed: Although every convergent sequence is bounded 
(Example 4.3(5)), that fact was proved using a metric, whereas weak convergence 
T„ ^ f is not equivalent, in general, to such a strong type of convergence as 
d(T n , T) — > 0 for any distance function. 

Proposition 11.39 


If T n T where T n e B(X, Y ), X a Banach space, then 

(i) { T n : n e N } is bounded, and 

(ii) T e B(X, Y) with ||r|| ^ liminf ||T„||. 


Proof (i) Let T n — >- T ; the set { T\x, T 2 X, . . . } is weakly bounded in the sense that 
for all n e N, </> e X*, \(pT n x\ ^ since (<pT n x) is a convergent sequence 

in C. But an application of the uniform bounded theorem twice shows first that 
||7’„x| ^ C x , and then that T n is bounded. Of course, a simplified version of this 
argument applies equally well to weakly convergent sequences of vectors x n — ^ x 
and to weak*-convergent sequences of functionals (f, (f>. 

(ii) Take the limit of fT n (x + y) = 4>T n x + 4>T n y and <pT n (Xx) = XfT n x to show 
linearity of T . Similarly, the set { || T n || : n e N} is bounded in K and possesses a 
smallest limit point a, so taking a subsequence of \\T n || which converges to it, we 
obtain 

\4>T n x\ < ||0|| || T„ || ||x|| Wx e X, f e Y* 

4 - 4 

\<pTx\ a||</>||||x|| 

and || r|| u follows. Thus B(X, Y) is closed under weak convergence. □ 


As a partial converse there is 
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Theorem 11.40 


When X is a separable Banach space, every bounded sequence in X* has 
a weak*-convergent subsequence. 

If x \ , X 2 , ■ ■ . € X are dense in the unit ball, then X* has a norm 

OO j 

II <P\\w '■= X ^I0 X "I < II0II 

n= 1 

such that for <j>„ bounded, 

0n ~ ^ 0 -vy- ||0„ — 011 to ~ ^ 0- 

Thus the unit closed ball of X* is a compact metric space with this norm. 


This theorem can be generalized to non-separable spaces (see [10]), when it is 
known as the Banach- Alaoglu theorem: The unit closed ball of X* is a compact 
topological space. 

Proof (i) Let { x m } be a countable dense subset of X, and suppose \\<p n \\ ^ c. 
Then the sequence of complex numbers <p n x i is bounded, \(p n x\ f c || jci || , and so 
must have a convergent subsequence (Exercise 6.9(6)), which we shall denote by 
4>\,nX\ — * This subsequence is also bounded on x 2 , \4>\,nXi\ ^ c || JC 2 II , and so 

we can extract, by the same means, a convergent sub-subsequence, (p 2 ,nX 2 ■ Notice 
that, not only does 4>2,nX2 — s * V^fe) but also (p 2 ,nX\ 0Oti). Continuing this 
way, we get subsequences <p m n and numbers i/r(x m ) such that <p„ un Xi — > i jr(xt), for 
i ^ m, and |0(x m )| ^ c||x m ||. 


0/7 

01 02 03 04 05 

01,77 

01 

03 04 05 

02,77 

01 

03 05 

03,77 


05 

<l>k,k 

01 

03 


01,77-^1 — 

■> 0Ul) 

02,77-^2 — 

> f(xi) 

03,77-^3 — 

> 0U3) 

&k,k x m ~ 

* 0(*m) 


Let fk. := <pk,k , a subsequence of the original sequence (p n . In fact, ft is a 
subsequence of every from some point onward ( k ^ m), so 0>*m — > 0(x m ), 
as k — > oo. This implies that the function 0 is Lipschitz on the dense set { x m }, 


I f(x{) - f{Xj ) | = lim \ f k Xi - fk*j)\ 

J k— >oo J 


lim \<pk,ki x i ~ x j) I ^ c || jc i — xj 
k->o o 
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and so can be extended uniformly to a continuous function on X (Theorem 4.13), 
and still satisfying 1 4r(x)\ ^ c||x||. It is linear, as seen by taking the limit k — oo of 
VotC* + y) = ^kX + ifrky and = A .f k x. 

Now, for any e > 0, there is an x m close to x, \\x m — x|| < e, so that 

3K e N, k > K =>- \ir k x m — 4rx m \ < e 

=>> \f k x - i/rx| < \f k x - f k x m \ + I f k x m - fx m \ + \fx m - fx\ 

( 2 c + l)e, 

in other words \ jr k x — >■ \[rx for all x, or iAa 0, as A: -» oo. 

(ii) That 1 1 (p \ \ w is well-defined and bounded by 1 1 0 1 1 follows from | (px n I < \\H\\Xn II < 
||(/>||; that it is a norm follows from \<px n + irx n \ ^ \<px n \ + \irx n \ and \X(px n \ = 
|A,||0jc„|, as well as 


o = 1101U 


X! ™ I0 x «l ^ v ”’ l^ x "l = o (j) = 0 

n= 1 1 


since { x„ } is dense in Bx- 

(iii) When \\<p n \\ ^ c, 4> n — ^ </> \\<p„ — <p\\ w —> 0: It is enough to consider 

functionals </;„ such that ^ 0. Let e > 0 and M large enough that 1 /2 M < e. For 
all m , 0„x m -> 0 as « — > oo; this convergence may not be uniform in m , but it will 
be for the first M points xi, , xm, i.e., 

31V, n ^ N =>■ \<p n x m \ < €, Vm = 1, . . . , M. 

So || (p n || w —> 0, because for n ^ N, 

oo j M i oo j 

Il0»llt» = ^ ^ ^ ^77 110/! II ll-L/i II < (1 + c ) e - 

m=l m=l m=M+l 


Conversely, let </;„ be bounded functionals such that || r/j„ || u . —>■ 0. This implies 
that for any fixed m. 


1 ^ 1 

— \(pnX m \ < 2_t 
m= 1 


0, asn — »■ oo, 


(11.3) 


so <p n x m — > 0. For any x e X, choose x m close to within e of y := x/||x||. This is 
possible because { x m } are dense in the unit ball. Then, for n large enough, 

1 0/7 V I < \<PnX,n\ + | (j>n(x m - y) | ^ + C€ 

=>■ \4> n x\ < (1 + c)||x||e 
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Hence (p n x -> 0 for any x and so (p„ — ^ 0. 

(iv) B x* is compact with respect to \ \ ■ \ \ w : Every sequence <p n in B \ [0] has a weak* - 
convergent subsequence by (i), i.e., \\<p n — <p\\ w — > 0. For any x e X, 

\(px\ = lint \<p n x\ ^ \\(f> n || ||x|| ^ ||x||, 

n — > ex) 

so ||</j| 1, and B\ [ 0] has the Bolzano- WeierstraB property of compactness (Theo- 

rem 6.21). Note carefully, however, that it is not necessarily compact in the standard 
norm of X*; only in finite-dimensions are balls totally bounded (Proposition 8.23). 

□ 


Examples 11.41 

1. In a Banach space, if x n -x x and {x n : n e N } is totally bounded, then x„ — > x. 

Proof Every totally bounded set has a Cauchy subsequence, so x nj —xyeX. 
By continuity, <px m — »■ <py — (px for any functional <p, hence x = y. If x„ fx x, 
then one could find another Cauchy subsequence converging to y f x, which is 
impossible. 

2. A subset A C X is said to be weakly bounded when V</> e X*, (pA is bounded. It 
turns out that A is weakly bounded A is bounded. 

Proof Given that \<pa\ ^ R ^ for all a e A and </> e X*, recall that (pa = a**cp, 
where a** : X* — > F, so the uniform bounded theorem can be used to yield 
|| fl || = || a **|| < C. 

The idea of using functionals to transfer sets in X to sets in F is so convenient and 
useful that it is applied, not just to convergence, but to various other properties. 
In a general sense, we say that a set A C X is weakly V when for all (p e X*, (pA 
has the property V. 

3. A vector x is a weak limit point of a subset A when for any (p e X*, every open 
ball in F which contains <px also contains another point (pa for a e A, a x. A is 
said to be weakly closed when it contains all its weak limit points. Every weakly 
closed set is closed, since x n -> x =£> x n — >■ x. 

4. If T is linear and (pT is continuous for each (p e Y* ( i.e.,x„ — > x => Tx n — ^ T x), 
then in fact T is continuous. 

Proof For every bounded set B, <pT B is bounded by continuity. So TP is weakly 
bounded, which is the same as bounded. 

5. A Hilbert space is weakly complete, i.e., if ((px n ) is Cauchy in F for each (p e H*, 
then x n -x x for some x . 

Proof Let <p(y) := lim,,-^ (x n , y); (p is linear and continuous by the uniform 
bounded theorem, so must be of the form (p = (x, •) and (x„, y) —*■ (x, y) for 
each v. 


254 


1 1 Banach Spaces 


6. * Closed and bounded sets of a Hilbert space are weakly sequentially compact, 
meaning any bounded sequence has a weakly convergent subsequence. 

Proof Let M := Ja'i , jt 2 Then M* ~ M, so Theorem 11.40 can be used 
to conclude that there is a subsequence x ni that converges weakly in M, i.e., 
(a,x ni ) —> (a, x) for all a e M. But in fact, for any vector y e H and its 
orthogonal projection Py e M, (y, x n ) = { Py , x n ) ( Py , x) = (y, x). 

7. If x n — ^ x => Tx n Tx, then T is compact. 

Proof If B is a bounded set and Tx n any sequence in T B, then x n e B has 
a weakly convergent subsequence by the note above; by hypothesis its image 
converges, T x nj Tx, and is thus a Cauchy sequence in T B. T B is therefore 

totally bounded. 

8. * The “Least Distance Theorem” 10.1 1 can be generalized to when M is weakly 
closed. (Note that closed convex subsets are weakly closed.) 

Proof The sequence (y n ) of the theorem is bounded, hence has a weakly conver- 
gent subsequence y nj — ^ y* e M. Moreover ||y„ — jc|| — » cl. Taking the limit of 
|{ y ni ~ x, y* - x)| ^ || y„, ~ *||||y* - -*11 gives \\y* - x\\ < d. 

Exercises 11.42 

1. Show e n 0 in co or l 2 , yet e n 0. 

2. (a) For l 1 , l 2 , and l °° , if x„ = ( a,u ) — x = (a,) then each component 

converges a,,, — > a, as n —> oo. But the converse is false; e.g. e n -f^ 0 in 
l l . 

(b) For l 2 , x„ — x if, and only if, x n are bounded and each component con- 
verges, a n i — > a,. (Hint: approximate any cp by bjej .) Can you gen- 

eralize this to £P (1 < p < oo)? 

3. In L*[0, 1], the functions /„(.*) := e 27r " ,JC converge weakly /„ — ^ 0, but not 

pointwise f„(x) 0 at any x (see Theorem 9.25). 

4. ► The weak limit of 7j, , if it exists, is unique. A subsequence of 7j, also converges 
to the same weak limit. 

5. ► If x n — ^ x then Tx n — ^ Tx, for T e B(X, Y ). 

6. Show that the norm is not continuous with respect to weak convergence, by 

finding a sequence in co such that x n — v x yet || x n || || x || . Similarly, the inner 

product of a Hilbert space is not weakly continuous: x n — ^ x, y n — ^ y do not 
imply (x„,y n ) (x,y). 

7. In a Hilbert space with an orthonormal basis e„, 

(a) e„ 0, 

(b) Xn “« e « •* O Xn ■*• 

(Hint: The series is bounded, by Proposition 1 1.39, i.e., ||aiei + • • • + u n e„ || 2 
^ c and so ( a n ) e £ 2 ; or use Example 1 1.37(5).) 
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8 . <j) n —r <j) => <j>„ <p. 

9. * Schur’s theorem : In l l , weak convergence of x n is the same as convergence in 
norm. Prove this as follows: 

(a) If the statement were false there would be unit x n — ( a n i ) e l ] such that 

x n 0 . 

(b) For each n there is an N n such that I a ni I > j- 

(c) Each coefficient converges to 0 as n — »■ oo, so 

x - 1 

Vfc, 3M, n ^ M => 2_\a ni \ < 

i<k 5 

(d) A subsequence of (jc„ ) exists with 

1 Nn 3 i 

^ \ a ni I < “, ^ Wni\ > ^ I a ni 1 < 7 

i<N„~ 1 i=N „- 1 i>N„ 

(e) Let y := (\a n j\/a n i) e l°° where for each i,n is such that N n - \ ^ i < N n . 
Show | y ■ x n | ^ i to obtain a contradiction. 

10. Addition and scalar multiplication are continuous with respect to weak 
convergence, that is, if T n — - T and S n — ^ S then T n + S n — ^ T + S, and 
XT n — * 7.7’. Of course, they are also continuous with respect to norm- wise and 
strong convergence. 

11. Multiplication of operators is not continuous with respect to weak convergence. 
The most that can be said is 

(a) if T n T then T n S — ^ T S and ST n ST, 

(b) if T n — ^ T and S n x — > Sx for all x, then T n S„ — * 7 . S', 

(c) if V0 e X*, <pS n <pS and T n — T then S„T n ST. 

12. (a) For Banach spaces, if Tj — v 7’ then T n — ^ T (but not conversely). 

(b) For Hilbert spaces, if T n — ^ T then r„* — ^ T* ( weakly continuous). 

13. If T is compact then x„ — ^ x => T x n — > Tx (Hint: { x„ } must be bounded.) 

14. If x n — v x in Z, then (f> (<px n ) maps X* into c. For example, when X is £ 1 , 
this map converts bounded sequences to convergent ones. 

15. Every closed linear subspace is weakly closed (by Proposition 11.18). Thus, if 
x„ x, then there is a sequence y„ e Jx i , xj ....[] which converges in norm, 
y n -> x. 

16. A set in X* is weak*-closed when it contains all weak*-limit points; for example, 
A- 1 . 
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17. The strong limit of unitary isomorphisms U„ between two Hilbert spaces is an 
isometry U. But U need not be unitary; e.g. let U„ be defined on by 

• • ■) (^r+1> ^1 > ^2> • • • ; ®n+ 2, @n+ 3i ■ ■ ■)• 


Then U n converges strongly to the right-shift operator R. 


18. The Hadamard matrices are defined recursively by T\ 0-0 , T n + 1 := 

( J . S„ := 7', 1 /2' !/2 are 2" x 2" unitary matrices; they can be extended 

to unitary operators on l 2 by U n x := S„x when x e M n := [eo, . . . , ej n - 1 ] , 
and U n x := x when x e M^~, and then U n — 0. 


19. If a sequence of unitary isomorphisms U n converges weakly to U, then 1 6'| 1. 

If U is known to be unitary, then the convergence is pointwise. (Hint: expand 
|| U n x — Ux\\ 2 .) 


Remarks 11.43 

1. Not every closed subspace of a Banach space need be “complemented”, e.g. the 
space l°° ^ Co ® M for any closed linear subspace M (see Proposition 9.2 for the 
definition of co) (see [38]). Indeed there exist infinite-dimensional Banach spaces 
whose only complemented subspaces are the finite-dimensional or codimensional 
closed ones [42]. 

2. It is a theorem that Hilbert spaces are the only Banach spaces in which every 
closed subspace is complemented [40]. 

3. Weak convergence does not obey all the convergence properties of metric spaces. 
For example, not every weak limit point of a set M need have a sequence in M 
that converges weakly to it. 

4. There are yet other types of convergence. For example, B(X, Y) is itself a Ba- 
nach space, and so there is weak convergence with respect to B( X , Y)*, meaning 
<t>T n -* <I>r for all O g B(X, F)*. 


Chapter 12 

Differentiation and Integration 


12.1 Differentiation 

Although continuous linear transformations are stressed throughout the book — with 
good reason, for they are the morphisms of normed spaces — they represent, of course, 
a very special part of all the functions from one normed space to another. To put things 
in perspective, recall that the linear maps on R. are x i->- kx, a very restricted set 
of functions in comparison with the non-linear real continuous functions. However, 
the linear maps are still relevant for one class of continuous functions: maps that 
are ‘locally linear’, meaning that they can be approximated by linear operators up to 
second-order errors: 

Definition 12.1 


A function / : X — > Y between normed spaces (over the same field) is said 
to be (Frechet) differentiable at x when there is a continuous linear map 
f'(x) e B(X, Y) such that for h in a neighborhood of 0, 

f(x + h) = f(x) + f'(x)h + o(h) 
where ||o(/t)||/||/t|| —*■ 0 as h — >■ 0. 


Note that / need not be defined on all of X but only on a neighborhood of x. The 
set of functions / : U c X — »■ Y, where U is an open subset of a normed space X 
and / is differentiable at all points x e U, is here denoted D(U, Y ). 


J. Muscat, Functional Analysis, DOI: 10. 1007/978-3-3 19-06728-5_12, 
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Proposition 12.2 

The set of differentiable functions DiU, Y) forms a vector space. 

Differentiation D : f i-> f is a linear map, which takes composition of 
functions to operator products, 

(. / + <?)' = /' + 5 (*/)' = >./'. 

(fogY(x) = f\g{x))g\x). 


Note that the domain of D is D(U,Y ) and its codomain is the vector space 
of functions [g : U —> B(X, Y) }. The last identity is called the chain rule of 
differentiation. 

Proof The statements follow from the following identities and inequalities: 

if + g)(x + h) = fix + h) + g(x + h ) 

= fix) + f'(x)h + o f (h ) + gix) + g'(x)h + o g {h) 

= fix) + gix) + if + g')ix)h + iofih) + o g (h )) 

A ,f{x + h) = A/(x) + kf'ix)h + Xoih) 

f o gix + h) = f [gix + hj) 

= figix) + g\x)h + Ogih )) 

= f(gix)) + f(gix))(g'ix)h + o g ih )) + o f ih) 

= f(gix)) + f(gix))g’ix)h + ifigix))o g ih) + OfQi)) 

\\ofih) + o g ih)\\ ^ || o/ (A) || + ||o ff (A)||, 

IIM/OII = |A.|||o(A)||, 

\\To g ih) + o f ih)\\ < ||T||||o s (A)|| + ||o/(A)||, foranyT eBiX,Y). □ 

Examples 12.3 

1. The constant functions fix) := yo are differentiable with f — 0. 

2. In R or C, the functions fix) := x n are differentiable with 

fix + h) = ix + h) n — x n + nx"~ { h + oih ), 

so fix) — nx"~ l . Polynomials are thus differentiable. 

3. Continuous linear maps are differentiable, Tfx + h) = Tx + Th, so T'(x) — T . 
A special case of the composition law is (T o f)' = T o f when T is a fixed 
operator. 
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4. The derivative of F : R -» R 2 , F(t) := = f(0(o )+ff(0(°) is F'(t) = 

( A differentiable path r:R -> X is called a curve. The direction of its 

derivativer' is called its tangent. The arc/engt/j of a curve is f r ds := f\r'(t)\dt. 

5. Define f:R 2 — > M by /(x, y): = x 2 — y. Then f'(x, y) : R 2 -> R is its 
gradient fix, y) = (2x, —1) since 

f(x + h, y + k) = (x + h) 2 - (v + k) = (x 2 - y) + (2x -1 ) + h 2 . 

The map (, h , k) i->- (xZ — yo) + 2xo h — k gives the tangent plane to the surface 
z = fix , y) at the point (x 0 , vo, zo)- 

6. A real inner product (-, •) : X 2 — > R is differentiable, 

(x + h, y + k) = (x, y) + ({x, k) + {h, y}) + { h , k). 


The middle term is linear in (/;, k), and the last term is o ( h , k) by the Cauchy- 
Schwarz inequality. 


\(h,k)\ 
\\h\\ + \\k\\ 


0 as (/?, k) 


( 0 , 0 ). 


7. We often write D v f(x ) := f’{x)v. Note that 


D v+W f= D v f + D w f, D kv f = kD tt f. 

Because of this last property, v is usually taken to be a unit vector. 

When X = R. there are only two unit vectors, v = ± 1, and the notation used is 
:= l)\ for the derivative in the positive direction. Similarly, for C, := if . 
In R N , the standard basis consists of N unit vectors e n , and we define if := D en . 

8. For X — K, the derivative can be taken to be a function f : R — > K, since 
Z?(R, Y) = Y. 

9. ► Differentiable functions are continuous in x, in fact are Lipschitz in a neigh- 
borhood of any point 

11/00 - / 0)11 = ll/'0)0 - x) + o{y - x)|| < c||y - x||. 

In particular, f{y) — y fix) as y — > x. But there are Lipschitz functions, such 
asx H- |x| on R, that are not differentiable. 

10. * The set of functions / e C(R) with bounded continuous derivatives, /' e 
C (R) , is denoted by C 1 (R) . It is a non-closed linear subspace of C (R) . However, 
it can be given a complete norm 

ll/llc‘ := ll/llc + ll/'llc- 
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Differentiation is then an operator C 1 (R) — > C(R). 

Proof The functions siting have unit norms in C(R), but their derivatives n cos nx 
have arbitrarily large oo-norm. This allows us to define 

oo i 

f{x) := ^ — sin4"x 

n = 1 


with the partial sums fy converging in C (R) (the series is absolutely convergent). But 
this is an example of a nowhere-differentiable function (check it is not differentiable 
at 0 at least), so although ff e C 1 (R) and fy — > / uniformly, f C^R). 

C 1 (R) is complete: if (/„) is a Cauchy sequence in C 1 (R), then (/„) and (ff) are 
Cauchy sequences in the complete space C(R), so they converge to /, g e C(R). By 
Proposition 9.18 , f = g and f n —> f in C*(R). 

That differentiation is continuous is trivial for this space: 

\\Df\\c = \\f'\\c^Wf\\c + \\f'\\c = \\f\\c'- 

It is not continuous when C 1 (R) is considered as a subspace of C(R). 

Proposition 12.4 


The kernel of D on D(X . Y ) consists of the constant functions, 

Df = 0 =>■ / is constant. 


Proof We first identify the kernel when the differentiable functions are real, g : R — * R. 
Suppose g' it) = 0 for all t e [ a , /;] , and let 


G(t) := git) - 


it ~ a)g(b) + jb- t)gja) 
b — a 


also differentiable, with G(«) = 0 = Gib), and 

Git + h) — Gf) = G'it)h + o(h) = - 9{b) ~ gia) b+oih), te]a,b[. (12.1) 

b — a 


G is continuous on the compact set [a, b], so it must have maximum and minimum 
points. We can assume one of them to be inside ]a, b[, for if they are at a and b, then 
trivially G is 0 throughout [a, b]. 

Now, on any minimum of G within ]a , b[, as h changes sign from negative to 
positive, Gfo + h) — Gf o) remains positive; on a maximum it remains negative. 
From (12.1), this can only hold if g(a) — gib). As a and b are arbitrary, this shows 
that g is constant. 
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For /' = 0 on X, we can use functionals to reduce it to a real-valued function: 
let g(t) := <j> o f{tx ) for any non-zero x e X and <p e Y*. It is differentiable, 

g(t + h) — (f> o f(tx + hx) = </)(/(fx)) + (p(f'(tx)hx^ + o(hx) = g(t ) + o(hx), 

with derivative g\t) = 0. By the first part, g(t) = g( 0) = <p ° /( 0) constant. But 
with (f> and x arbitrary, this shows that / = /( 0), a constant function. □ 

Exercises 12.5 

1. Show that for differentiable functions A. : R — »• F, /, g: M. —>■ X, T : R — > 
B(X, Y), 

(a) ±(X(t)f(t)) = X'(t)f(t) + Xf / (.t), 

(b) {/, g)' = (/', g) + (/, g'), 

(c) £f(/(/),«7(0) = diFifit), g(t))f'(t) + d 2 F(f(t), 

(d) = T'(t)f (t) + T (t)f'(t). 

2. For a curve on the sphere r : [0, 1] -* S 2 , the tangent t at any point satisfies 
t ■ r — 0. 

3. ► For a differentiable function y : K' v — > y' is the Jacobi matrix [dtyj], 

4. The derivative itself, fix), need not be continuous in x. For example, show 
that f(x) := x 2 sin( 1 /x) (and /( 0) := 0) is differentiable at all points, yet its 
derivative is not continuous at 0. 

5. If / : X R is differentiable and has a maximum/minimum at x in some open 
set (/Cl, then fix) = 0. 

6. L’Hdpitcil’s rule: If / : R — > X, g: M — >■ F are differentiable functions satisfying 
f(a ) = 0, g(a) = 0, but g'(a) ^ 0, then 

.. f(x) /'(«) 

hm = . 

x ^ a 9 (x) g'(a ) 


12.2 Integration for Vector- Valued Functions 

The construction of L 1 (M) can be extended to include functions / : R — ► X, where 
Visa Banach space, as done in Section 9.2. Briefly, 

• a vector- valued characteristic function xl e maps t to x e X when t e E C R 
and to 0 otherwise; 

• a simple function is a linear combination of vector characteristic functions on sets 
of finite measure, in which case, 
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N 

^ , /t ( E n )Xfj . 

n= 1 


The set of simple functions is a normed space with ||.y|| := J ||i(f)||x dr. 

• a function / : R. — »■ X is integrable when it is the ae-limit of a Cauchy sequence 
of simple functions s n —> f a.e., ||s„ — s m || -» 0 as n , m — > oo; its integral is 



• on a measurable set A C.R, J A f := f fl A , e.g. /* f = f [a b] f for a ^ b. 
Quoting the results of Section 9.2, 

Proposition 12.6 


For /, g : R -> X integrable, 

0) / / + 3 — / / + / ^ £ F), 

(ii) II //IK/ 11/(011 dr, 

(iii) / A(f)xdf = (f X)x for A e L'(R),x e X, 

(iv) ./' Tf = T f f for T e fi(X, T). 


Examples 12.7 




dr = / /(f) 


■3(f) 


ck = 


( 10 - 


when /, p : 


integrable. Similarly,^ (^ 3 ) dr = (| /3 1/4 ) ' 

2. Any continuous function f:[a,b] -* X is integrable, since /J 7 ||/(f)|| df ^ 

(*-a)ll/llc- 

3. If f n (t) —> f(t ) in X, uniformly in t e [a, b], then /j 7 f n — > /j 7 / in A, since 

1 /Vn-/)N ll/n(0 - /(Oil dr < || /„ - f\\L°°[aM( b ~ a )' 

J a J a 

The connection between differentiation and integration is one of the cornerstones 
of classical mathematics. It remains valid for vector-valued functions: 



12.2 Integration for Vector- Valued Functions 


263 


Theorem 12.8 Fundamental Theorem of Calculus 


If/: [a, b] -> X is integrable, and continuous at t e ] a, b[, then its integral 
is differentiable at t, and 


d 

At 



= fit). 


If f : [a, b] -> X is continuous, then 


[ f = fib)- fid). 

J a 


Proof (i) The first part is a consequence of 

rt+h r' 


and 


/ = ^ f + fiOh + {^j‘ + ' f - fit)hj 
i r ,+h .... . . „ „ r ,+h fir) - fit) 


h 


/ , t-tn ri 

fir) Ax -J\t) = J] j 


h 


dr 


< 


< 


H+h Wfir)- fit)\\ 

l Jt W 

w\\J, ,+h dT l = e 


dr 


for arbitrary e > 0 and \h\ sufficiently small, since / is continuous at t. 

(ii) For the second part, let Fit) := f. By (i) we obtain F' = /' on ]a, b[, so 
their difference Fit) — fit) must be a constant c. As Fia) — 0, c — — fid). □ 

Proposition 12.9 Mean value theorem 

For a continuous function / : [a, b] -> X, 

i r b 

fit) At 


b — a 


belongs to the closed convex hull of f[a,b]. 


Proof The function is uniformly continuous (Proposition 6.17), so splitting [a, b] 
into small enough intervals [r„, t n + 1 ] of size h = ib — a)/N each (?„ := a + nh), 
ensures that ||/(f) — /(/) || < e whenever t, t' are in the same sub-interval. This 
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means that / can be approximated uniformly by a simple function which takes the 
value f{t n ) on the interval [t n , t n +\[, and its integral J b / can be approximated to 
within e(b — a) by the sum 

(/ ( a ) + / ( f i) + •■■ + / ( tp[-i))h . 

Thus j b f is within e of ( f{a ) + fia + h) + ■ — \- f{b — h))/N which belongs 
to the convex hull of f[a, b]. Since e is arbitrarily small, the result follows. □ 

Corollary 12.10 


For a continuously differentiable function /: [a, b] X, 

fib) - /(a) 
b — a 

belongs to the closed convex hull of f'[a. b]. 


Proof 

fib) - /(a) = _J_ r b , 
b — a b — a J a 


□ 


Recall that f is a function U —> B(X , Y ): it may itself be differentiable, 
with derivative denoted by f"(x) e ll( X . B(X , Y)). This Banach space is actu- 
ally isomorphic to the space of bilinear maps B(X 2 , T) via the identification 
Tx ! X 2 = T (x\ . X 2 )- Because of this, fix) is akin to an operator that converts a 
pair of vectors of X into a vector in T; in particular, f (x)(li, h) makes sense, and 
is often shortened into the form f"(x)h 2 . 

More generally, / (,!) is the nth derivative of /: it takes n vectors in X and outputs 
a vector in Y. The set of n-times differentiable functions / : R -> X, with f ( " } 
continuous, is denoted by C"(R, X). 

Theorem 12.11 Taylor’s theorem 


For / e C"(M, X) (n = 1,2,...), 


fit + h) = fit) + f\t)h + --- + 


f (n) jt)h" 

n\ 


oih' 1 ). 
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Proof As expected the proof proceeds by induction on n . To illustrate the idea behind 
the inductive step, we only consider how the statement for n = 2 follows from that 
for n — 1. Let / e C 2 (R, X), and let 

F(s) := f(t + s)~ fit ) - f\t)s - f"it)s 2 / 2! 

We wish to show Fih ) = o(h 2 ). F is continuously differentiable in .v because it 
consists of sums and products of continuously differentiable functions, in fact 

F'is ) = fit + s)~ fit ) - f"it)s = of), 

since f is differentiable. Using the above corollary, it follows that h (0) belongs 
to the closed convex hull of F'[Q. h ] , whose values are at most of order oik). Since 
FiO) — 0, we have Fih) — o(/z 2 ) as required. 

The reader is invited to adapt this proof to show that if the statement is correct for 
n then it is also true for n + 1 . The case n = 1 is, of course, part of the definition of 
the derivative. □ 

Exercises 12.12 

1. Integration by parts’. fit)F'it)dt = [ f F] b a — j * fit) Fit) dr, where / : R — > 

F and F: R — »■ X have continuous derivatives. 

2. Change of variables’, f* fix) dx = Fly) dy, where y : R — »■ R has an 
invertible continuous derivative, and Fly lx)) = fix). 

3. If / : [a, b] — ► M is continuous, where M is a closed linear subspace of X, then 

Xf/eAf. 

4. The symbol o(/z) satisfies ||o(/z) || ^ c\\h || for h small enough, but not necessarily 
||o(/j)|| ^ c||/z|| 2 . However show that the latter inequality is true if fly) is 
Lipschitz in y in some ball about x, by evaluating 

/ ~rf( x + th) - / ix)h dr . 

Jo d t 

Application: The Newton-Raphson Algorithm 

If it is required to find a vector x which solves fix) — 0, where / is differentiable, 
we might start with a first estimate x and find a better approximation from 

0 = fix + h) « ff) + f ix)h, 

namely h — —f'ix)~ l fix). This suggests the following iteration: 
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Proposition 12.13 The Newton-Raphson Method 


Let x be a zero of / and suppose that in a neighborhood of x, f is dif- 
ferentiable with fix) Lipschitz in x and ll/'C*) -1 !! ^ c. Then if xo is 
sufficiently close to x, the iteration 

x n +\ := X n - f(x n )~ l f(x n ) 

converges to x. 


Proof The differentiability of / at x states that for h = x n — x, \h\ < e 

fix + h) = fix) + f(x)h + o(h), 

fix n ) = f'ix n )h + - fix n ))h + oih), 

• f(x„)~ l fix,,) =x„-x + f(x n )~ X {(fix) - f'(x n ))h + o(h)) 

\\x n +i - ill < ll^ll 2 = c\\x„ - ill 2 , 

where k is the Lipschitz constant of f and ||o(/z)|| ^ kk\\h\\ 2 (Exercise 12.12(4) 
above). If e < 1/c then it implies firstly that if x n belongs to If (i), then so does 
x„+i, and secondly by induction it follows that ||x„ — i|| ^ (c||xo — i||) 2 /c — > 0 
as n — > oo. □ 

This method is very effective since it converges quadratically, as long as xo and 
x are already close enough. In practice, other algorithms are utilized to perform a 
broad search for a zero, and Newton’s method is then used to rapidly home in on it. 
Another caveat is that it may be computationally expensive: one has to calculate not 
only the derivative f(x) but effectively also its inverse. The methods that are most 
often used employ modified iterations like x n +\ := x n — H n f(x n ), where H n are 
operators that approximate fix,,) 1 but are easier to calculate. 

Examples 12.14 

1. To solve for e IZ — 1 close to z = 6, Newton’s iteration can be applied to f(z) := 
e iz - 1, 


x„+i := x n T i(l - e~ lXn ) 
x 0 6 

xi 6.27942 + 0.03983; 

x 2 6.28334 - 0.00080; 

x 3 6.28319- 0.0; 


Examples of other equations whose solutions are routinely found using this 
method are (a) roots of polynomials e.g. x 3 = 2, (b) transcendental equations 
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such as x — sin a- = 1 or a tan a = 1, (c) simultaneous non-linear equations, 
e.g. a 2 + y = 1 , a + v 2 = 2. 

2. The method can be used to find the minimum of a scalar differentiable function, 
say on JR 2 , which is equivalent to finding zeros of its derivative. For example, if 
the function were exactly quadratic 


f(x) — a + b ■ x + -x T Ax 

then the minimum occurs when Ax + b = 0, and Newton’s method finds the 
minimum point in one step: x i = —A~ 1 b. The more undulating a function is, the 
more demanding it becomes to find the true minimum. Two challenging functions 
that have served as benchmarks are the following 

(a) (1 — a) 2 + 100(y — a 2 ) 2 (Rosenbrock’s valley), 

(b) (a 2 + y — 1 1) 2 + (a + y 2 — 7) 2 (Himmelblau’s function). 

3. * To align two real-valued functions / and g as best as possible, one may find a 
that minimizes f (/(a + a) — g( a)) 2 dA. Expanding this out, then differentiating 
in a, gives 


J (/(a) - g(x)) 2 + 2(/(a) - g(x))f'(x)a + (f'(x)ci) 2 
+ (/(a) - g(x))a T f" (x)a dx + o(a 2 ) 

J (/(a) - g(x))f\x) + (f\x)a)f\x) 

+ {fix) - g(x))f"(x)a dA + o{a) = 0 
The Newton-Raphson estimate of a is 

« = f IT ) + (f - g, nr 1 {f - g, f). 

Letting /„+ j(a) := /„( a + a ), /o(a) := /(a), and iterating aligns the two 
functions. (You can try this out with /(a) = cos a and g(x) = cos(a + 1) over 
the interval [0, 2 jr].) This method has been implemented to align images (when 
a, a e R 2 ), for example to compensate for video camera jitter from one frame to 
the next. 


12.3 Complex Differentiation and Integration 

Let X be a complex Banach space, then a differentiable function / : C — ► X is also 
called analytic, i.e., for all z, h, 


f{ Z + h) = f(z) + f\z)h + o{h). 
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The set of functions / : C — > C which are analytic at all points z in an open set 
U 3 A, is denoted by C m (A). 

A function / : C — »■ X is integrable along a differentiable path w : [to, t\] -» C, 
when the composition / o w : [to, ft] C -» X is integrable. Its integral is then 

[ /(z)dz:= I'' f(w(t))w\t)dt. 

J W j to 

Notice that d z/i is along the normal to a path. Proposition 12.6 remains true, for 
example property 2 becomes 


f(z)dz\\ < 


||/(w(f))|| ds, where ds := | w'(t)\ dt. 


Examples 12.15 

1. Along any curve w which starts at w(0) = a + bi and ends at w ( 1 ) = c + cl i , 



w , (t)dt = [w(t)] l 0 = [z] c a +i i i . 


More generally, /'(z)dz = fj f'(w(t))w'(t) dt = [/(z)]^- for / ana- 
lytic (with /' continuous). Thus one can integrate analytic functions in the same 
manner as real-valued functions. 

2. The map z i — ^ 4 is analytic except at z — 0. On a circular path w(t) := re", 
0 < t ^ 2tt, 

II r 2n l 

I - dz = -e "ire" dr = 2 jti 
Jc Z Jo r 


(independent of the radius). Thus the integral f_[ j J dz does not have a unique 
answer, but depends on whether one traverses a path that passes above or below 

f 1 

the origin, and how often it loops around it. But otherwise / — dz = 0. 

Jo z" 

3. Cauchy-Riemann equations'. An analytic function /: C — > C, x + iy i— >• 
u(x, y) + iv(x, y) satisfies the equations 


3 u dv du dv 

dx 3 y ’ 3y dx ’ 


since /'(z) = + r — i which can be obtained by comparing 

/(z + h) = m(x, v) + ^-h + iv(x, v) + r ^-h + o{h) = /(z) + f\z)h + o(h), 
ox dx 

du dll , 

f(z + ih) = u(x, v) + —h + iv(x, v) + i — h + o(h) = /(z) + f (z)ih + o(h ), 
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4. The conjugate map z m- z is not analytic, z + h = z + h. Therefore, Re(z) = 
(z + z)/ 2, Im(z) = (z — z)/2z, and |z| = zz, are not analytic. Indeed the Cauchy - 
Riemann equations can be written symbolically as = 0, and interpreted as / 
being independent of z. 


Cauchy’s Theorems 

Analytic functions / : C — > X are profoundly different from the similar-looking 
functions / : K 2 -* X that are simply differentiable over the reals. This is borne out 
by a string of results discovered by Augustin Cauchy in the 19th century. We will 
only present here the essential theorems (See [20] for a more thorough presentation). 

Theorem 12.16 Cauchy’s Theorem 

Let QcCbea bounded open set having a finite number of differentiable 
curves as boundary. Let / be a function from C into a Banach space, which 
is analytic on and in £2, then along these boundary curves, 

j) f (z) dz = 0. 


Warning: the curves must be traversed in a consistent manner, say with the region 
£2 to the left of each curve. A fully rigorous proof requires results that are too technical 
to be presented in a simplified form (see [10]). These details will be disregarded in 
favor of a more intuitive approach, both for this theorem and its corollaries. 

Proof At any analytic point, /(z + h) = f(z) + f'(z)h + o(h), where o(Ji)/ h — »■ 0 
as h — > 0. So for any e > 0 and \h\ < S small enough, we have ||o(/j)|| < eS. For 
any closed curve □ inside a disk Bs (zo) C £2 we get, using Example 1 above. 


/ f(w)dw= / /(zo + z)dz 

/□ J □ 


= [ /(zo ) + /'(zo)z + o(z) dz 
JD 

= /(zo) [ 1 dz + /'(zo) f zdz + 

Jn Jn 




o(z) dz 


= / o(z) dz 


/□ 


/(io)dwj|| ^ / ||o(7z)|| dj ^ eS x Perimeter(D) 


/□ 


( 12 . 2 ) 
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Each point zo € £2 might need a different 8, but since Q is compact, there is a 
minimum <5 that works at all points (as in Proposition 6.17). 



The region Q can be covered by an array of squares of side 8, as shown in the 
diagram. The integral on the boundary 3£2 can be split up into a sum of integrals 
along the squares that are within £2, except that when a square intersects the boundary 
3 £2, the integral is partly along the square and partly along the boundary. Each tiny 
loop has perimeter at most 4 8 + /, where / is the length of that part of the boundary 
curve which lies inside the square. 

If £2 is enclosed in a square of side L. there are at most ( L/8 ) 2 squares in all, so 
the sum of the integrals is at most 


lan 


f(w)dw\\ ^ 


Zl 




f(w) d w 


< ^e3(4<5 + /,)by(12.2) 

i 

^ (aL 2 + Perimeter (£2) <5^ e 


With e arbitrarily small, the integral must vanish. □ 

Corollary 12.17 


If / is analytic in the interior £2 of a simple closed curve w, then the integral 
fa f(z) dz is well-defined when a,b e £ 2 , independent of the path taken 
(within £ 2 ). 


Proof Any two paths inside £2, from a to b, together form one or more simple closed 
paths, inside which / is analytic. Hence the integral of / on this closed loop is 0. □ 

One of the surprising results of Cauchy’s theorem is that the value of the integral 
§ f(z ) dz is independent of the bounding curve itself, but only on interior “distant” 
regions ! 
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Fig. 12.1 Cauchy 


Augustin Louis Cauchy (1789-1857) studied under Lagrange 
and Laplace as a military engineer, but decided to continue 
with mathematics. A staunch royalist, he replaced Monge at 
the Academie des Sciences after the fall of Napoleon. Although 
he published important papers in the fields of elasticity and 
waves, he became famous for his taught courses on analysis and 
calculus in the 1820s, in which he proved the diagonalization of 
real quadratic forms and pushed forward the new standards of 
rigor, e.g. limits, continuity, convergence. 


Corollary 12.18 Cauchy’s Residue Theorem 

The integral over a closed simple curve depends only on those regions 
inside where / is not analytic, 

j /«dz = X Residue, (/) 


parts by a finite number of curves w,- — the outer 
boundary curve y already does this, but it may be 
possible to further isolate the non-analytic parts — 
to form one analytic region, around which the inte- 
gral is zero, 

2rcij y f+ 27ti^J Wi f ^° 

traversing each curve w, in a clockwise direction. 
The value of the integral around each non-analytic 
region in a counter-clockwise direction may be 
called a ‘residue’ of /. □ 

Because of this, the integral around a closed simple curve is often denoted by 
§ f(z ) d;, without reference to the (counter-clockwise) path taken, as long as it is 
clear from the context which non-analytic regions are included. 

The simplest cases in which a function fails to be analytic are of isolated points, 
called isolated singularities. An example of an isolated singularity a is a pole of order 
n when the function is of the type f(z)/(z. — a) n with / analytic in a neighborhood 
of a and f(a) f 0. A simple pole is a pole of order 1. All other isolated singularities 
are called essential singularities. We shall see later that the residue of a function at 


Proof Enclose the non-analytic 
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a pole of order n + 1 is f ln Ha)/n\, but what can be proved here is the case for a 
simple pole: 

Proposition 12.19 Cauchy’s Integral Formula 


If / : C X is analytic inside a simple closed path that contains a, then 


/(a) >//<£> d . 


2jti J z — a 


Proof The integrand f(z)/(z — a) is analytic except at z = a, so by Cauchy’s 
theorem the path of integration can be taken to be a small circle of radius r about a. 
As / is analytic at a, we know f(a + w) = f(a ) + f'(a)w + o(w), so 


f(z) _ f{a + w) 

z — a w 


f ( q ) 

w 


+ f'(a) + 


o(w) 

w 


Integrating around a closed simple path eliminates the constant function f'(a), and 


1 

2ni 


o(w) 

w 


dz 



|o(w )l 
| w 


if r is small enough that \o(w)\/\w\ < e. Thus in the limit as we take smaller circles, 
only the term 5^7 § dw = f(a) is left. □ 

Examples 12.20 

1 . Interpreting the residue theorem in actual examples 
often yields results that would be harder to obtain 
otherwise. For example, the function e ,z /z has a 
simple pole at 0 with residue 1 . So using a contour 
as shown in the diagram, we obtain 



2 ni 




— &x+ [* e- R(sine - icose) idd + 
x Jo 



i d6 


As R 


to give lim 

r— >-0 
R^o o" 


f R sinx 


dx = 7 t/2. 


2. Maximum modulus principle'. If / : C — »■ C is analytic and has a local maximum 
(or minimum) at a, then / is constant in a neighborhood of a. It follows that on a 
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compact subset K,\f\ attains its maximum and minimum at the boundary of K. 
Proof Using a circular path of any radius r, 


\f(a)\ = 


1 

2 717 


m 

z — a 


dz 


r2n 


f — / \f{a + re w )\d6 < \f(a)\ 
Jo 


f2 n 


\f(a)\-\f(z)\d0 = 0 


so |/(z)| = |/(fl)| within the disk, which in turn implies f (z) is constant 
(Exercise 12.21(4)). Let f~ l M be the subset of the interior of K where |/| 
attains the maximum M := max, e jf° |/(z)|. It is open by the above, and closed 
in K° ((Exercises 3. 12(1 1))), hence must contain whole components of K°, unless 
empty. By continuity, / takes the same value M on the boundary. 

3. We say that a function / has a zero of order n at a when /(z) = (z — a) n g{z), 
with g(a) f 0, g analytic in a neighborhood of a. 

If / : C — > C has a zero (or pole) of order n at a, then f'/f has a simple pole at 
a with residue n (resp. —n) 


f(z) = n{z- a ) n - i g{z) + (z-a) n g\z) n g'(z ) 

/(z) (z - a) n g(z) z-a g(z) ’ 

(g' /g is analytic at a). Thus j> Jj = n\ more generally it equals the difference 
between the number of zeros and poles (counted with their order) inside the curve 
of integration. 

4. Rouche’s theorem : If p„ — > / inside a closed simple curve y , with / non-zero on 
y , then / and p n have the same number of zeros inside y , from some n onwards. 

Proof As | / 1 has a non-zero minimum on y , there is an n such that 

on y. Let F := p n /f then ^ = § Foy t dz = 0, since F o y is a closed curve 

that excludes 0. By the previous example, this implies that F has the same number 
of zeros as poles, that is, the zeros of p n and of / are the same in number. 

Exercises 12.21 

1 . Show that, along any closed curve □ in C, Jjj 1 dz = 0 and Jq z dz = 0, but on 
a unit circle centered at the origin, f o Re(z) dz = rti. 

2. If fniz) —>■ f(z) in X for all zona simple closed curve w , on which f n and / 
are continuous, then / lu /„(z)dz-> f w f(z)dz. 

3. Assuming u and v are sufficiently differentiable, deduce from the Cauchy- 
Riemann equations that the real and imaginary parts of an analytic function 
f — u + iv are harmonic, 

3 2 u d 2 u d 2 v d 2 v 

dx 2 + 3 y 2 ’ dx- + 3 y 2 
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4. Let / : C — > C be analytic. Suppose |/| is constant in some open set, then / is 
constant. (Hint: Differentiate |/| 2 = u 2 + v 2 .) 

5. Find the poles and residues of (a) e lz /(z 2 + 1), (b) (c) (sinz)/z 2 (First 

show (sinz)/z is analytic at 0). 

1 / z 2 + 2 1 

6. ® — r dz = — along a simple closed counter-clockwise path that 

2: ri J z(z- - 1) 2 

includes 0, 1, but note —1. 

7. Show 

(a) St 2+cose d0 = 2jt/V3 using /(z) := l/(z 2 + 4z + 1), z(6>) = e ,e 

O’) /-°°oo dx = fusing 

(c) / 0 °° dx = f using /(z) := (1 - ^)/z 2 . 

8. By applying Example 3 to f = e 9 , prove that the order of any of its poles must 
be zero. As this is impossible, the isolated singularities of / must be essential 
singularities. 

9. Use Rouche’s theorem to show that cosh z — 2 cos z has 2 zeros in the unit disk, 
assuming it equals its MacLaurin series. 

Remarks 12.22 

1 . The first use of the Newton-Raphson method was by the “Babylonians” who used 
it to find square roots, x 2 = n. Newton’s method was initially restricted to finding 
roots of polynomials, and it was Simpson (1740) who described the iteration we 
use today. 

2. Cauchy’s theorem for analytic functions is a special case of Green’s or Stoke’s 
theorem <f F ■ dr = ff V x F ■ d A. In this case, using the Cauchy-Riemann 
equations, 


/(z) dz = ® (u + iv)(dx + idy) = ® u d.r — v dv + i ® v dx + u dy 


f f / dv 3 u\ f f ( 3m 3iA 

J] + 3f) dA + ‘ J] (to “ 3^) 


dA 


= 0 . 


Part III 
Banach Algebras 



Chapter 13 

Banach Algebras 


13.1 Introduction 

We now turn our attention to the space of operators B(X). We have seen that it is a 
Banach space when X is one (Theorem 8.7), but additionally, one can compose, or 
multiply, operators in B(X). This extra structure turns the vector space B ( X ) into 
what is called an algebra. We shall mostly study these spaces as abstract algebras 
X without specific reference to them being spaces of operators, in order to include 
other examples of algebras and to make some of the proofs clearer. Nonetheless, 
B(X) remains our primary interest, and accordingly, the elements of an algebra will 
be denoted in general by upper-case letters T, S, ... to remind us of operators and 
to distinguish them from mere vectors x. 

Definition 13.1 


A unital Banach algebra X is a Banach space over C that has an associative 
multiplication of vectors with unity 1, such that for all R, S, T e X, k <= C, 


(R + S)T = RT + ST, R(S+ T) = RS + RT, 
(. XS)T = X(ST ) = S(XT), 



1 || = 1 . 


Throughout this book, a Banach algebra will mean a unital Banach algebra. Of 
course, Banach algebras over M are also of interest, and all the results in this chapter 
apply to them in modified form; but complex scalars are necessary for an adequate 
spectral theory of X. 


J. Muscat, Functional Analysis, DOI: 10. 1007/978-3-3 19-06728-5_13, 
© Springer International Publishing Switzerland 2014 
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Easy Consequences 

1. 1 is unique, because 1' = l'l = 1 for any other unity V. 

2. T is said to be invertible (or regular ) when there is an element S, called its inverse, 
such that ST = 1 = TS. The inverse of T is unique when it exists, and is denoted 
T~ l . If AT = 1 = TB then A = A(TB ) = ( AT)B = B so A = T~ l . 

3. ( S + T) 2 = S 2 + ST + TS + T 2 , and more generally, 

( S + T) n = S n + ( ST " _1 + TST"~ 2 + • • • + T n ~ l S ) + • • • + T n . 

4. ||r ! || < ||tt. 

Proposition 13.2 


Multiplication, (T, S) TS, is a differentiable map. 


Proof In the identity 

(T + H)(S+ K) = TS + (TK + HS) + HK, (13.1) 

the map ( H , K) T K + HS, X 2 — > X, is linear and continuous, and HK is of 
lower order, since 

\\TK + Z/S|| < || 71 1| /if || + \\S\\\\H\\ < max(||r||, ||S||)(||ff|| + ||^||) 

II HK\\ < || ||||Ji:|| < (||ff || + \\K\\) 2 = || (ff, K)\\ 2 . 


Of course, every differentiable map is continuous. □ 

Examples 13.3 

1. C ,v with the the oo-norm and the following pointwise multiplication and unity: 


' a\ \ 

K a N ) 


(b x 



2. ► £°° with pointwise multiplication xy, and unity 1 = (1,1,...) (Exercise 
9.4(3)). 

3. C(K), the space of continuous functions on a compact set K, with pointwise 
multiplication fg(x) := f(x)g(x), and unity the constant function 1 . For exam- 
ple, C[0, 1] is a space of paths in the complex plane. 

4. ► l 1 with the convolution product; unity is Co = (1, 0, . . .) (Exercise 9.7(2)). 
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5. The space L 1 (R) with convolution as a product; although it does not have a unity, 
wecanartificially adda<5,calledDirac’s “function”, suchthat<5* /:=/=: /*<5. 
(To make this rigorous, one needs to consider L 1 (R) x C with elements (/, a) 
representing / + aS .) 

The above examples happen to be commutative, i.e., ST = TS holds. But this 
is not assumed in general. For example, T 2 — S 2 ^ (T — S)(T + S ) in general. 

6. ► B(X) for any Banach space X\ the product is operator composition (Proposi- 
tion 8.8). 

7. ► If X and y are Banach algebras, then so is A’ x y with 


8. Every normed algebra can be completed to a Banach algebra. 

Proof Using the notation of Proposition 7.17, if T = [T n \ and S = ( S n ] , let 
ST := [S„ 7’,,] and 1 := [1]. Note that S n T n is a Cauchy sequence by 


\S n T n — S m T m || ^ || S n T n — 5 n T m || + || S n T m — S m T m \ 
^ II $n II II T n — T m || + || S n — 5 m ||||r m || 
^ c(||5„ — 5 m || + II T n — I’m ||). 


Hence 


R(ST) = [ R„(S n T n )] = [(R n S„)T n ] = ( RS)T , 

HST) = [k(S„T n )] = ( XS)T = S(XT), 

||sr||= lim ||5„r„|K lim l|5 n ||||r„|| = HSIlliril. 

n — > oo n — > oo 

9. The polynomials C[z] on Be with the oo-norm form an incomplete algebra. As 
we shall see shortly, its completion is the space of analytic functions C , ' , (Be). 
More general is the tensor algebra, consisting of polynomials and series in N 
non-commuting variables. 

10. ► If ST = 0 and S is invertible, then T — 0. But there may exist non-zero 
non-invertible elements S, T, called divisors of zero, for which ST = 0. Note 
that T S need not also be 0, so S and T are more precisely called left and right 
divisors of zero, respectively. 

11. ► The product of invertible elements is invertible, with (ST) -1 = 7’ - 1 S' 1 . 
Also, (T -1 ) -1 = T . If T n is invertible, for some n f 1, then so is /’. 

But it is possible for two non-invertible elements to have an invertible product, 
i.e., ST invertible T invertible (unless T R is also invertible for some R). In 
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particular, ST = 1 by itself is not enough to ensure T and S are invertible. For 
example, in B(l '), the product of the (non-invertible) shift-operators is LR = I . 

12. Suppose an element satisfies some non-zero polynomial, p(T) = 0. The unique 
such polynomial of minimum degree and leading coefficient 1 is called its min- 
imal polynomial p m . It divides all other polynomials p such that p(T) = 0. 
Proof There cannot be two minimal polynomials, p m and p, otherwise p m — p 
has a lesser degree than both and p m (T ) — p(T) = 0. If p(T) = 0, then 
p = qp m -f r by the division algorithm of polynomials. As r has a strictly 
smaller degree than p m , yet r(T) = p(T) — q(T)p m (T) = 0, it must be the 
zero polynomial. 

13. The derivative of the map T i->- ST is S. Similarly the derivative of T i— T n is 

H h* HT n ~' + THT n ~ 2 H b T n ~ l H. 

Because of commutativity, this simplifies to (z n Y — nz n ~ l in C. Thus, any 
polynomial in T is differentiable in T . 


Subalgebras and Ideals 
Definition 13.4 


A subalgebra of an algebra A” is a subset which is itself an algebra with 
the same (induced) addition, scalar multiplication, product, and unity. It is a 
Banach subalgebra (with the induced norm) when it is also complete. 

An ideal is a linear subspace 1 such that ST, T S e 1 for any T e X, 
S el. 


To show that a non-empty subset A is a subalgebra of X, one need only show 
closure of the various operations, i.e., for any S,T e A, S + T e A, XT e A, 
ST e A, 1 e A. The required properties of the induced operations are obviously 
inherited from those of X. 

Examples 13.5 

1. C is embedded in every (complex) Banach algebra as Cl = { z 1 : z e C}.In 
fact, it is customary to write z when we mean z 1 . 

2. An element T generates the subalgebra of polynomials 


C [ T\ -f- a [ T + • ■ ■ + a n T n \ a\ , . . . , a n e C , n e M } . 
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More generally, a finite number of commuting elements T\, ... ,T n gener- 
ate the commutative algebra C[7’i , . . . , T n ], which may contain, for example, 

1 - 27a + T*T 2 . 

3. The algebra l°° contains the closed ideal co. 

Proof That c o is a closed linear subspace of l°° is proved in Proposition 9.2. Let 
(«„) e £°°, {b n ) e c 0 , then (a n )(b n ) e c 0 since 

| lim a n b„\ < sup|a„| lim \b„ \ = 0. 

m— MX) „ n— MX) 


We will see later that every commutative Banach algebra, except C, has non-trivial 
ideals (Example 14.5(4)). 

4. The center X' := { T : ST = TS, VS e X } is a commutative closed subalgebra 
of*. 

Proof If T \ , 7 2 e A", then 

S(T] + XT 2 ) = ST\ + XST 2 = 7) S + aT 2 5 = (Ti + XT 2 )S, 

S(T 1 T 2 ) = TiST 2 = (TiT 2 )S, SI = S = IS. 

The algebra is commutative by definition of X' . 

5. ► Proper ideals do not contain 1, or any other invertible element T, otherwise 
it would have to contain every element S = ST~ l T. (However, as remarked in 
Example 13.3(1 1), the set of non-invertible elements need not be an ideal, or even 
a subspace.) 

6. A closed ideal gives rise to a quotient algebra X /X with multiplication and unity 
defined by 

(S + 1)(T+ T) := ST+1, 1 +1. 


7. 


A maximal ideal is a proper ideal T for which the only other ideal containing it 
is X itself, 

IC JC^ J = 1 OR J = X. 


Maximal ideals are necessarily closed, assuming that the closure of a proper ideal 
is also a proper ideal (Example 13.22(3)). 

8. * Every proper ideal is contained in a maximal ideal. 

Proof Let C be the collection of all proper ideals that contain the proper ideal 
T. By Hausdorff’s maximality principle, C contains a maximal chain of nested 
ideals I a . Then A4 := (J a P a is an ideal, since if T el„ and S e Tp C X a , say, 
then S + T el s C A4, and for any S e X. both ST and 'I S are in C Jxi. 
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It is obvious that M. is proper and contains X since 1 3 1 for every a, and 

that A4 is maximal since the chain X a is maximal. 


Morphisms 
Definition 13.6 


A morphism <t> : X — > y of Banach algebras is a continuous linear map 
(preserving limits, addition, and scaling) which preserves multiplication and 
the unity, 

<S>{ST) = 4>(S)4>(r), ®(1*) = ly. 

A character is a Banach algebra morphism </> : X — >■ C. The set of char- 
acters, denoted by A, is called the character space, or spectrum, of X. 


Examples 13.7 


1 . 


Invertible elements of X are mapped by algebra morphisms to invertible elements 

ofy, 

W 1 = <&( t~ x ). 


since 0(r)ch(7’“ 1 ) = 4>(7T _1 ) = d>(l) = 1 and similarly, d>(7’“ 1 )d>(7’) = 1. 

2. ► The kernel of a Banach algebra morphism, ker<J> := { T : <J>(r) = 0}, is a 
closed ideal. It is maximal when O e A. 

Proof If <P(T) = 0, then < p(ST) = <t>(S)<D(r) = 0; similarly, ®(TS) = 0. 
Maximality: Let <I> : X — > C be a morphism, and let the ideal X contain ker <J> 
as well as some T ker <t>. Then dHT) = X ^ 0, and 0(1 — T) = 0; so 
X = (A. — T) + T el, and X must equal X (Example 13.5(5) above). 

(Every maximal ideal of a commutative Banach algebra is of the type ker (f> with 
4> e A, but the proof requires Exercise 13.10(19) and Example 14.5(4); see the 
proof of Theorem 14.38.) 

3. An isomorphism of Banach algebras is defined to be an invertible morphism 
<J> : X — > y such that <t> -1 is also a morphism. In fact, an invertible morphism is 
automatically an isomorphism. 

4. An automorphism of a Banach algebra X is an isomorphism from X to itself. For 
example, the inner automorphisms T i->- S~ l TS, for any fixed invertible S. 

5. Since C is commutative, commutators [.S', 7’] := ST — T S are mapped to 0 by 
characters (if they exist). 
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Representation in B(X) 

Some mathematical theories contain a set of theorems stating that any abstract model 
of the theory can be represented concretely. For example, every group can be repre- 
sented by a permutation group, and every smooth manifold is embedded as a smooth 
“surface” of a Euclidean space. In this regard, every finite-dimensional Banach alge- 
bra can be embedded, or “faithfully represented”, as a matrix algebra, and more 
generally, we have the following representation theorem: 

Theorem 13.8 


Every Banach algebra can be embedded as a closed subalgebra of B(X), 
for some Banach space X. 


Proof The Banach space X is the Banach algebra X itself without the product 
(although there may well be ‘smaller’ Banach spaces that fit the job). That is, the 
theorem claims that X is embedded in B(X). To avoid confusion, we temporarily 
denote elements of X by lower-case letters, and the operators on them by upper-case 
letters. 

Let L a (x) := ax be left-multiplication by a. Then L u e ll(X) since multiplication 
is distributive and continuous: 

L a (x + y ) = a(x + y) = ax + ay = L a (x ) + L a (y), 

L a (Xx) — a{Xx) — X(ax) = \L a (x), 

||L a (x)|| = ]|«JC || ^ ||fl||||x||, 

so that || L fl || ^ || a || . Furthermore, 

L a +b{x) — (a + b)x — ax + bx — L a (x) + L b (x), L\(x) = lx = x — I(x), 
L\ a (x) = ()m)x = \L a (x), L a ( 1) = a\ = a, 

Lab(x) = (ab)x = a(bx) = L a L b (x), 

so || a || = ||L fl l|| ^ || || ||1 1| = || Ln|| and ||L fl || = ||a||. These show that the 

mapping L : X -> B(X) defined by L : a \-x L a is an isometric morphism of 
Banach algebras. In fact, the space of such operators, imL, is a closed subalgebra of 
B(X) since isometries preserve completeness (Exercise 4.17(5)). Note that all the 
Banach algebra axioms have been used. □ 

As one may anticipate, B(X) and B(Y) are not isomorphic as Banach algebras, 
when X and Y are not isomorphic as Banach spaces. The proof, however, is not as 
obvious as one might expect. 
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Theorem 13.9 

Let X and Y be Banach spaces. A Banach algebra isomorphism J : 
B(X ) — > B(Y) induces a Banach space isomorphism L : X ->■ Y, such 
that 

J(T) = LTL~ l . 

Thus, every automorphism of B(X) is inner. 


Proof The idea is to establish a 1-1 correspondence between vectors x e X and 
certain projection-like operators P x e B(X), and similarly y -o- R y for Y ; using the 
given mapping J : T i->- T, the sought isomorphism would then be 

L : x i->- P x ti- R y i-> y. 

The correspondence x -o- P x : For the remainder of the proof, fix a vector a e X, 
a / 0, and a functional <p e X* such that (pa = 1. Multiplying x by </> gives an 
operator P x := x<p, that is, P x u := (<pu)x; conversely, multiplying P x with a gives 
back the vector P x a = xtpa = x. The crucial characteristic of these operators is, for 
any T g B(X), 


T P x = Txcp — (Tx)cp = P Tx , Pxi+x 2 = Ui + X 2 )<p = Pxi + Px 2 - 

In particular P x P a = x<pa<p = P x . Note that || 1| = \\x<p\\ ^ ||x||||(/)|| and 
||;c|| = ll-Pi-flll ^ || P x || || a || . Thus, P : X — > B(X), x i->- P x is an embedding. 

The isomorphism J maps P x G B(X ) to a similar operator R y G B(Y): The 
relation P~ = P a is preserved by J , so P a J(P a ) is a non-zero projection in 
B(Y). Pick h G im P a and f G Y* such that f h = 1 and \j/ ker P a = 0 (Proposition 
11.18), and define R y := y\jr. R y satisfies analogous properties as P x , such as 
R y b — y and T R y = Rf Y . Now suppose c G im P a , and let T G B(X) correspond 
to R c G B(Y) under J: then J transforms the identity 

P a T P a = a((pTa)(p = XP a , where X — cpTa, 


to P a R c P a — XP a , so im P a = |[£i]] since 

c = PaC — P a R c b = P a R c P a b — XP a b = Xb. 

Thus the projections P a and If, have the same image and the same kernel, and we 
can conclude that they are equal to each other. 

Hence, the identity P x = P x P a becomes, in B{Y), 

Px = P x Rb = R p x b = R >” where y =P x b- 
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The map L : x > y = J (P x )b is an isomorphism : That L is linear, continuous, 
and 1-1 follow from: 


L(x i +x 2 ) = J(P xl+X2 )b = J(P XI + P X2 )b = L{x\) + L(x 2 ), 
L{Xx) = J(Pi x )b = J(kP x )b = kL(x), 

\\Lx\\ = \\J{P x )b\\ sj ||S||||0||||x||||6||, 

Lx = 0 J(P x )b = 0 <s> J(P X ) = 0 <S> P x = 0 x = 0. 


Given any y e Y, 7 _l maps the identity R y = R x K/, to S = SP a = Ps a ■ So for 
x := Sa, 

Lx = J(Psa)b = Ryb = y. 


and L is onto. By the open mapping theorem (Theorem 1 1.1), L is an isomorphism. 

T = LT L~ l : J maps the identity T P x — Pj x to T Rl(x) — Rl(Tx)- Multiplying 
by b to get the vector form, this reads TLx = LT x for all x e X. 

When X = Y, then L e PAX), and J is an inner automorphism. □ 


Exercises 13.10 

1. Banach algebras of square matrices abound: the sets of matrices of type 



(ofl)’(ia), ° r ( U h are eac ^ c l° sec l under addition and multiplication, 

and are Banach subalgebras of B( C 2 ). 


2. C := C 2 with 


(i) 


lth (b) 00 := (ad + Z + bd) is a Banach algebra, with unity 


ad + be + bd 

. (Hint: it is a matrix algebra in disguise.) 


3. Find examples of 2 x 2 matrix divisors of zero, ST = 0 ^ T S. 

4. Show that in an N -dimensional algebra, every element has a minimal polynomial 
of degree at most N; e.g. every square matrix A has a minimal polynomial. 
Show also how the Gram-Schmidt process (with respect to the inner product of 
Example 10.2(2)) can be applied to the sequence I, A, A 2 , ... to construct this 
minimal polynomial. 

5. An idempotent satisfies P 2 = P. They are the projections in IAX); what are 
they in C N and f°°? The idempotents of C[0, 1] are trivial. Show further that 
PXP is an algebra with unity P, called a “reduced algebra”. 

6. A nilpotent satisfies Q n = 0 for some n, e.g. Cl) and 0-0 . In C N , 

£°°, and C[0, 1], there are no nilpotents except zero. Find all the 2x2 matrix 
nilpotents of order 2, i.e., Q 2 — 0. 
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7. An element is cyclic when T n = 1 for some n, e.g. ^ ^ J . In C N and l 00 they 

are sequences whose terms are of the type e 2jr ""/" for a fixed n. 

8. The product of differentiable functions is again differentiable, with 

(fg)'(T)H = [ f'(T)H]g(T ) + f(T)[g'(T)H]. 

This can be written in short as the familiar product rule ( f g )' = f'g + fg', 
provided it is remembered that the vector H is acted upon by each derivative. 

9. If F : R. -+ A” is integrable and T e X, then f F(t)T At = (f F)T (First show 
it true for simple functions). 

10. * Group Algebra : Let G be a finite group of order N, and { e g : g e G } be 
an orthonormal basis for C iV ; define e g * e/, := e g h, and extend the product to 
all other vectors by distributivity. The result is a Banach algebra C G (or A (G)) 
with unity e\ and the 1-norm. Every basis element is cyclic. 

For example, the cyclic group { 1 , g : g 2 — 1 }, gives rise to an algebra generated 

by e, := 0) and e g := ^ ^ ^ , and the product 

(b) * (d) := (aCl +be ^ * (CC1 +de ° ) = (Ztad) ■ 

11. The closure of a subalgebra is an algebra (use continuity of the product). 

12. If X and J are ideals, then so are X + J and X. 

13. Thecenterof B(X) isC. (Hint: Consider projections forany.r e X,<p e X*.) 

14. ► The centralizer or commutant of a subset A C AL 

A' := { T : AT = TA, VA e A} 

is a closed subalgebra of X. (In fact, when X = B(H), A' is weakly closed by 
Exercise 11.42(1 la).) 

Prove: 

(a) A c B => B’ c A', 

(b) A c A" and A!" = A', 

(c) If T e A' is invertible in <T then T~ > e A', 

(d) If elements of A commute, then A C A' and A" is a commutative Banach 
algebra. 

15. A left - ideal is a linear subspace of X such that TX C X for any T e X. 
Similarly, for a right- ideal, XT C X. For example, XS is a left-ideal, and SX is 
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a right-ideal, but XSX need not be an ideal. Instead, the ideal generated by S 
is IXSXJ. 

16. Let A be a closed subset of [0, 1], then 


I A :={/eC[0,l]:Vre A,/(x) = 0} 
is a closed ideal of C[0, 1]. Conversely, given a closed ideal X of C[ 0, 1], let 
A:={xe[0, 1] : V/ el,/ (x) = 0 }, 


then X — X ,\ . What are the maximal ideals? 

17. Let X\ be a closed ideal of C[0, 1], where A is a closed subset. Then the mapping 
f +X a i->- /U is an isomorphism C[0, 1 MX a = C(A). 

18. An algebra morphism <J> : X — > y ‘pulls’ ideals X i n y to ideals in X . 

19. If X is a closed ideal, then <X>(T) := T + X gives a Banach algebra morphism 
O : A” — ► X /X with kernel ker <t> —X. 

20. The mapping ^ n a n z n i->- («„) from the set of power series converging 
absolutely on the closed unit disk D of C, considered as a subspace of C (Z>), to 
i 1 is a 1-1 Banach algebra morphism. 

21. Let a be a permutation of 1, . . . , N\ then the mapping defined by (zi, . . . , zn) i - * 
(Zo-(i), . . . , Za(N )) i s an automorphism of C N . 

22. For the group algebra C G , let a be an automorphism of the group G; then 
e g i— !► e a (g) induces an automorphism on C G . 

23. The algebra C N is embedded in B(JC N ) as diagonal matrices. C is represented 
by the matrices ^ ^ a ^ ^ ^ ■ The group algebra C G is generated by the Cayley 
matrices of G. 


24. Show that every Banach algebra of dimension 2 (over C) can be represented by 
the matrices generated from I and (m) , where a is a fixed number and (1 is 
0 or 1. What are ot and /3 for the group algebra generated by { 1, g : g 2 = 1 }? 

25. Let A” be a Banach algebra contained in B(X). Its unity P — P 2 is a projection, 
so X — M © N where M — imP. For every T e X, PT — T = T P implies 
M is T -invariant and TN = 0, hence X acts on M. 
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13.2 Power Series 
Definition 13.11 


A power series is a series ]T n a„ T" where a n e C and T e X. 


Recall that the root test can help determine whether such a series converges or 
not: if \\a n T n \\ 1/n = \a n \ x ^ n \\T n \\ l ^ n converges to a number less than 1, then the 
power series converges. It is important to know that || T n || " converges: 

Proposition 13.12 


For any T in a Banach algebra, the sequence || T n || l/ " converges to a num- 
ber denoted by p(T), where 

WneN, p(T) < ||r' ! || 1/n ^ ||r||. 


Proof It is clear that 0 ^ ||7’"|| 1 /" ^ ||r||. Let p(T) be the infimum value of 
||7’"|| 1 /", meaning that ||7’"|| 1 ' / " is bounded below by p(T ) and 

Ve > 0, 3N, p(T) ^\\T N \\ l/N < p(T) + e. 

Although the sequence ||7'"|| 1 ' / " is not necessarily decreasing towards p(T), notice 
that || T qm \\ l ! qm ^ \\T m \\ l l m . For any n, let n = q n N + r n with 0 < r n < N (by the 
remainder theorem), then 0 ^ r n /n < N /n -*■ 0 and q n /n = ^(1 — —*■ as 

n oo, so that 


p(T) < \\T n \\ x,n = ||7’‘?« A, r r "|| 1/ " < \\T N \\ q " ln \\T\\ rnln 


lir^ir <p(T) + e. 


Since e is arbitrarily small, this shows that || T n || l ' n — > p(T) from above. □ 

Examples 13.13 

1. ► (a) p(l) = 1, (b) p(kT) = \\\p(T), (c) p(ST) = p(T S), (d) p(T' 1 ) — p(T) n , 
since 

Hill 1 /" = i, \\k n T n \\ l/n = |k|||r'|| 1/n , 

p(ST) < ||(5T)"||" < ||S|| " || (T S) n ~ l || " ||L|| « -> p(TS), 

II (T n y n || l/ m = || T nm || ™ p(T) n as m oo. 


But p{T) may be 0 without T = 0; and p(S + T) p(S) + p{T) in general 

is not usually a norm on X. 


( eg -(oo)’(io)) Sopis 
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2. ► p(T) = || 71 <£> || || = || 7T, V/z e N, since ||71 = p(T) < ||7’”|| 1/ " < 

II r || . 

3. ► If p(T) < 1, then T n — >■ 0 (even though ||7’|| may be bigger than 1). If 
p(T) > 1, then T n — oo. 

Proof For e small enough and n large enough, 

||7 ,n || 1/ " < p(T) + e < 1 =► 

\\T n \\ l/n ^ p(T) > \ +e =>• 

Theorem 13.14 Cauchy-Hadamard 

The power series a nT", where a n e C, T e X, 

• converges absolutely when p(T) < R, and 

• diverges when p(T) > R, 

where R := 1 / lim sup \a„ \ l / '' is called the radius of convergence of the 
series. 


T n || < (p(T) + e) n — >■ 0, as n — > oo, 
T n || > (1 + e) n -> oo. 


Proof This is a simple application of the root test. The nth root of the general term 
satisfies 

lim sup ||a„7’"|| 1 /” = lim sup \a n \ l ^ n p(T) = p(T)/R. 

n n 


Thus, if p(T) < R , then the series converges absolutely, while if p(T) > R. 

then it diverges. Assuming X is complete, the power series converges or diverges 

accordingly. □ 

Examples 13.15 

1. Ratio test : If KI/K+tl R then so does |a„| 1//,! (Section 7.5), hence R 
would be the radius of convergence of a n T" . 

2. Some aspects of power series may seem mysterious from the point of view of 

real numbers: The series 1 — x 2 + x 4 — x 6 + ■ ■ ■ has a radius of convergence 
of 1 and converges to which takes a finite value at all x e R (but not at 
x — i). Moreover the same series can also be written as (5 — (4 — x 2 ))~ l = 
j yet in this form it converges in the larger range — 3 < x < 3. 

3. The theorem also applies to power series A n z n , where A n is a sequence of 
elements in X . The radius of convergence is then 1 / lim sup (J || A n || l,/ " . 
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4. When \a n \ ^ c for all n, then ajT 2 + a^T 3 + ■ ■ ■ = o(T ) for small T, since it is 
bounded above by c||r|| 2 /(l — ||!T||). 

When can a function be written as a power series? We wish to establish that being 
analytic in a neighborhood of 0 is a necessary and sufficient condition. The necessary 
part is the content of the following proposition, but sufficiency will be shown later 
(Theorem 13.26). 

Proposition 13.16 

A power series /(z) := iln z " ' s analytic strictly within its radius of 
convergence R, and 

OO 

f'(z) = y^na n z n ~ l . 

n=\ 

Proof First of all, the power series a„nz n ~ l converges, with the same radius of 
convergence R as 'ff ll a n z n , 

lim sup |na„| 1/,n = lim n 1 /" lim sup \a n \ l ^ n = \/R (Exercise 3. 5(1 d)) . 

n n 

For each individual term of the given power series, 

(z + h) n = z n + nz n ~ l h + o n (h). 

It needs to be shown that | ]T J( a n o„(h)\/\h\ -> 0 as h — >■ 0. One trick is to find an 
alternative way of expanding (z + h) n as follows: 

(z + h) n = (z T h)' l ~ l h + (z + h) n ~ l z 

= (z + h) n ~ l h + (z + h) n ~ 2 zh + (z + h) n ~ 2 z 2 
= (z + h) n ~ l h + • • • + (z + h) n - k z k ~ l h + • • • + z»-‘A + z» 

|(z + h)" — z"| ^ (|z + /d"“ 1 + --- + |z|' , “ 1 )|/d 

^nr n ~ l \h\, (13.2) 

where r is larger than |z| + \h\ but smaller than R. Now, 

o n {h) = (z + h) n — z" — nz"~ l h 

= (z + h) n ~ l h H h (z + h) n ~ k z k ~ l h H F z" _1 /i 

— z" _1 /i z n ~ k z k ~ { h z n ~ l h 

= Y J ((z + h) n ~ k - z n - k )z k ~ l h 
k= 1 

so \o n {Ji ) | < (n - l)r" _2 |/z| 2 + ■ ■ • + r n ~ 2 \h\ 2 


by (13.2) 
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n(n 


—r n ~ 2 \h\ 2 


But the series c 


z 

n=2 


n(n — 1) 2 

; r n 


converges for r < R, so 


O O 00 

| y^ a n o n (h)\ < y.|fl„|| 0 „W| <c|/z| 2 

n = 0 «=2 

which proves that the remainder term a n o„ (h) is o ( h ) . □ 

There are two important consequences: Since differentiating a power series gives 
another power series with the same radius of convergence, then we can differentiate 
repeatedly. Secondly, we know that polynomials are distinct as functions on C when 
they have different coefficients; this property remains valid for power series: If a 
function can be written as a power series, then its coefficients are unique to it. 

Proposition 13.17 


Assuming a strictly positive radius of convergence, 

(i) a power series f(z) X^=o a " z " ' s infinitely many times differen- 
tiable, and 


/ W ( 0 ) 

a„ = — 

n! 

(ii) distinct power series do not have identical coefficients. 


By distinct power series is meant b„T n ^ c n T n for at least one 7’. 
Proof (i) By induction on n, f ln) has the power series 

f (n \z ) = n\a„ + ( n + l)!fl„+iz + ^ — a n+ 2 Z 2 H 


Substituting z = 0 gives the stated formula. 

(ii) Suppose ]T ); b n T" = c n T n for all T such that p(T) < R , the smaller of 
their radii of convergence. By taking the difference of the two series, it is enough to 
show that if f(z) := a >i = 0 for all z e Br( 0), then a n = 0 for all n. But this 
is immediate from (i) since = 0 in this case. □ 
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The Exponential and Logarithm Maps 

There are a couple of power series of supreme importance. As motivation, consider 
the possibility of converting addition in a Banach algebra to multiplication, 


f{x + y) = f(x)f(y), /( 0) = 1. 


Apart from the constant function / = 1, are there any others? If / exists, it would 
have to satisfy a number of properties: 

(a) f(nx) = f(x)'\ fi~x) = fix) -1 , 

(b) When the algebra is M, fim/n) = a m ! n where a := /( 1) > 0 (Hint: f(n/n) = 

/(!/«)"), 

(c) / is uniformly continuous on Q fl [0, 1], so it can be extended to a continuous 
function on R, usually denoted by fix) = a x , 

(d) fix) = f if)) fix) if / is differentiable at 0, since f(h) = 1 + f'(0)h + o{h ) 
so 

fix + h ) = fix) fill) = fix) + fix)fm + o(/»); 

consequently / is infinitely many times differentiable with f (n fx) = f'i0) n fix). 
Taking the simplest case /'( 0) = 1 (so f (n> (Q) = I) leads to the following def- 
inition: 

The exponential function is defined by 


6 :=1 + r+ 2! 



Its radius of convergence is liminf,, \a n \ 1 /" = lim ;i ^oo = oo by the 

ratio test, so e T exists for any T and satisfies ||e r || ^ 

Similarly, starting with fixy) = fix) + fiy), we are led to the logarithm 
function, defined by 


T 2 

log(l + T) := T - — 



(— 1)" +1 
\ / rji. 


with radius of convergence liminf„ \a n \ 1//n = limH^oo = 1. 

Proposition 13.18 

When S , T commute, e s+T = e s e T . For piT) < 1, e io ^ i + T '> = 1 + T. 



13.2 Power Series 


293 


Proof (i) The product e s e 1 can be obtained in table form as, 


e s = 1 + s + ^S 2 +■■■ 



The general term in this array is S n T m = where N n + m 

is the A'th diagonal from the top left corner. This is precisely the nth term of the 
expansion of jn(S + T) N when S and T commute, so the array sum is e s+T . 

(ii) The second part can be (tediously) proved by making a power series expansion 
as above (Exercise 13.19(8)). We defer the proof until we have better tools available 
(Example 13.30(3)). □ 

Exercises 13.19 

1. Calculate p(T) for the following matrices 


«(oJ)' ,b, (o»)- <c) (o“)' (d) (o«)- 

Only one of these examples satisfies p(T) = ||T||. 

2. Every idempotent P, except 0, satisfies p(P) = 1; every nilpotent Q has 
p(Q) = 0, and every cyclic element T has p(T) = 1. 

3. For any invertible S, p(S~ l T S) — p(T), yet ||S -1 T S\\ may be much larger than 

' C . ) and S ( 1. ^ ), then S~ l PS = 

0 0 J \0 ci J 

has norm v 1 + \ac\ 2 . 

4. If ST — TS, then p(ST) ^ p(S)p(T). Deduce p(T~ l )~ { ^ p(T), and find 
examples of non-commuting matrices where p(ST ) > p(S)p(T). 

5. The equation T — AT B — C has a solution T — &"CB n if p(A)p(B) < 1. 

6. The radii of convergence of 



| T || . For example, let P 


-{ 


oo oo 

Y j n n T n , X nT "’ 

n—0 n—0 


oo oo 

n = 1 n=0 
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are 0, 1, 1, oo, respectively. A quick way of estimating the radius of convergence R 
is tojudge how fast the coefficients grow: if co^q ^ | a n \ ^ c\ r" then ^ R ^ . 

7. How are the radii of convergence of (a n + b„)T n and a„b n T n related to 
those of Y, n a n T '' and b„T n ? 

8. Let f(T) := a „ T" and g(T) := b„ T n . Find the power series 

expansions of f + g, fg and / o g. In particular, find the first few terms of 

9. Let f(T) := a n T n be a power series, and F(T ) := they have the 

same radius of convergence R. If || T || < R, then ||/(r)|| ^ F’(||7’||). 

10. The convergence of a power series is uniform in T, for ||r|| ^ r < R. 


1 1 . When T satisfies a polynomial p(T) = 0, then every (convergent) power series on 
T reduces to a polynomial in T . 

12. (a) e° = 1, (b) the inverse of e T is e ~ T , (c) e" T = (e T ) n . 


13. By analogy with the complex case, define the hyperbolic and trigonometric 


functions of T as power series, and show (a) eH o 


(?> 


/ 1 1 \ / cos x —x sin x \ 

cos^oij* = y o cos x )’ 

(d) e' T = cos T + i sin T . 


I cosx — sinx \ 
l sin x cos x J 


(c) e 1 


= cosh T 


(b) 
sinh T, 


14. Prove that there is a non-zero complex number a such that e a = 1. Thus the 
exponential function has a period, e T+na = e T . The smallest such number is 
6.283 . . . i =: 2 iti. 

15. * (1 + T / n) n —> e T as n —> oo. 

(Hint: each component in the series is ^ ("J T k — > jy T k , then use Exercise 9.7(1).) 

16. * The product of n terms, (1 + S/n)(l + T /n)( 1 + S/n ) • • • (1 + T /n) — >■ e s+T 
as n — > oo. (At least show convergence for each power term.) 

17. * Trotter formula: e s ! n e T ! n e 5 /" • • • e T ! n — > e s+T . For example, 


e S+T % e S/2 e T/2 e S/2 e T/2. 


Find the exact coefficients used in the Trotter-Suzuki approximation 
g 0.293S e 0.107T e 0.701S g 0.293T 

that make it the best possible to second order. These formulas are very useful to 
approximate e s+T whenever S and T do not commute. 
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13.3 The Group of Invertible Elements 

Among the invertible elements of a Banach algebra, one finds all the exponentials 
e T (including all non-zero complex numbers) and all their products, as well as the 
unit ball around 1, as the next key theorem proves: 

Theorem 13.20 


If p(T) < 1 then 1 — T is invertible, (1 — T) 1 = 1 + T + T 2 + ■ ■ ■ 


Proof The radius of convergence of the series 'f^ ll T n is 1, by Hadamard’s formula. 
For p(T) < 1, let Sn 1 + T + • • ■ + T N — > T”. Then, remembering that 

p(T) < 1 =+ T n —*■ 0 as N —*■ oo (Example 13.13(3)), 

S N = 1 + T + ■ ■ ■ + T n 

TS n = T H \_ t n + t n + 1 

=>• (1 - T)S n = 1 - T N+l 1. 

Similarly, ,S',y(l — T) — »• 1 as N -> oc. This shows that T" is the inverse of 

1 - T. □ 

Theorem 13.21 

The invertible elements of a Banach algebra A” form a group Q (X) with 
the operation of multiplication. Q (X) is an open set in X, and the map 
T i-> T -1 is differentiable on it. 


Proof Multiplication in a Banach algebra is associative and has a unity 1 e G(X). 
To prove Q(X) is a group, it needs to be shown that if ,S'. T e Q(X), then ST and 
7’~ 1 are invertible, a fact that is evident from 

(ST)-' = (T~ l r'=T. 

Let T be any invertible element of X, and consider any neighboring element 
T + H = T( 1 + T~ l H) 

with || H\\ < || 1| 1 . Then p(T~' H) ^ ||T _1 1| ||//|| < 1, so that 1 + T~' H, and 

by implication T + H, are invertible. As the neighboring points of T are invertible, 
T is an interior point of Q(X) and the group is open in X. 

In fact, writing T + H = T(I + T~ l H), 

(T + H)~' = (1 + T~ l H)~ l T~ l = T- 1 - T~ l HT -1 + T~'hT-'hT + ■■■ 
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This shows that T t-> T 1 is differentiable with derivative H i-^ —T 1 HT 1 , 
by verifying 


|| T~ l HT~ 1 \\ < ||r- 1 || 2 ||//|| 

oo 

T- X HT~ X HT~ X + • •• || < ^ l|Hir +2 ||r- 1 ||" +3 

n= 0 


||//|| 2 ||7’- 1 || 3 

l-lir-Mniff 


o(H). 

□ 


A group, for which the acts of multiplication and taking the inverse are differen- 
tiable, is called a ‘Lie group’, a topic that has a vast literature devoted to it. 

In particular note that for H = z 1, 


( T + z)~ l = T~ l - zT~ 2 + z 2 T ~ 3 + • • • , ( 13 . 3 ) 


and that the map z m- 7 — z i-> (7 1 - z) 1 is analytic wherever the inverse exists; 
its derivative is (T — z)~ 2 . 

Examples 13.22 

1. The group of N x N invertible complex matrices is often denoted GL(N,C). 
It has a group-morphism, the determinant det : GL(N , C) C x = Q(C), 


det AB = det A det B 


whose kernel is the normal subgroup SL(N, C) of ‘special matrices’ with 
determinant 1. 

2. In C, when z is large, z~ l is small. But for general Banach algebras there is 
no such relation between ||7’~ 1 || and ||7'||, e.g.the inverse of (10, 0.01) e C 2 is 
(0.1. 100). 

3. The set of non-invertible elements is closed in X. So the closure of a proper ideal 
is a proper ideal. 

Proof By Example 13.5(5), X C G(X) C , sol C G(X) C andZ does not contain 1. 

4. If T is invertible, then If(TS) C 7’/i f | T i|j(.S'). Consequently, multiplication by 
T is an open mapping. 

Proof Let \\A - TS || < e; then \\T~ l A - 5|| < || T’ -1 1| || A - TS\\ < HT-^e, 
as required. If U is an open set in X and S e U, then S e If (.S') C U , so 

TS e %|| r -i||(TS) c T B € (S) c TU 

and TU is open in X. 

5. The set of non-invertible elements is path-connected (to the origin, say), and may 
disconnect the group of invertible elements, e.g. GL( 2, R) disconnects into the 
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two open sets of matrices whose determinants are strictly positive and strictly 
negative, respectively. 

The following proposition confirms that as an invertible operator ft approaches 
the boundary of G(X), ||^ _1 || grows to infinity, as expected. 

Proposition 13.23 


Let T be on the boundary of the group of invertible elements. 

(i) For any invertible element R, | ft~ 1 | ^ 1 /|| ft — 7’||, 

(ii) 7 is a topological divisor of zero, meaning there are unit elements S n 
such that 

T S n -* 0 AND S n T -> 0, as n — »■ oo. 


Proof (i) Since T is at the boundary of the open set of invertible elements, it cannot 
be invertible, whereas R and all elements in its surrounding ball of radius ||ft -1 || 
are invertible, by the proof of the previous theorem. Thus ||ft — T II 7 lift” 1 !! ‘ as 
claimed. 

(ii) Let invertible elements R n converge to a boundary element T, and let S„ := 

K Vll^r'll; then 

II T’ Sn II = II L ft ” 1 II /II ft ” 1 II 

= ||(T-ft„)ft ) 7 1 + i||/||ft,7 1 || 

< ||7’-ft„|| + l/||ft7 1 || 

^ 2||ft — ft,, || — > 0 as n — > oo, 

and similarly S n T — > 0 as well. □ 

As remarked earlier, the group Q (X) need not be a connected set, but splits into 
connected components, with, say, Q\ being the component containing 1. Recall that 
a component is maximal connected, so if Q\ contains part of a connected subset of 
G(X), it must contain all of it (Theorem 5.1 1). 

Proposition 13.24 


The component of invertible elements containing 1, is that open normal 
subgroup generated by e 1 for all T. 
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Proof G\ is open in Q ( X ) : Any T e G\ is an interior point of G(X), so T e B € (T) C 
G(X). But the ball B f ( T ) is (path-)connected and intersects Gi, so B € (T ) C G\- 

Q\ is a subgroup of G(X): Multiplication by T is a continuous operation, so 'Iff 
is connected (Proposition 5.5). When T e Gi, then T = T 1 e TQ\ C G(X), so Q\ 
contains part, and therefore all, of TQ\. Hence T, S e Q\ =>• T S & TQi Q G\- 
Similarly, inversion is a continuous mapping, so Gf 1 is connected; it contains 1, so 
must be a subset of Gi, i.e., T e G\ =>■ T~ l e Q\. 

G\ is a normal subgroup: By the same reasoning, for any invertible T, T~' L Q\T 
is a connected subset of G(X) and contains 1, so it is a subset of Q\ (in fact it must 
equal it). 

Q\ is generated by the exponentials: Let £ be the group generated by the exponen- 
tials e T for all T e X; its elements are finite products e T ■ ■ ■ e s , and their inverses 
are of the same type ( e T ■ ■ ■ e s )~ l = e~ s ■ ■ ■ e~ 7 . It contains 1 = e°, and is con- 
nected since there is a continuous path from 1 to every element e T ■ ■ ■ e s , namely 
t I— e ,T ■ ■ ■ e ,s for t e [0, 1]. We can conclude that £ lies inside Q\. 

The elements near to 1 are all exponentials, 1 1 + H = e log(l+H) . and so a small 
enough neighborhood around E := e T ■ ■ ■ e s e £ consists of elements 

E + H = E( 1 + E~ l H ) = e T ■■■ e s e log 11 + £ ” lff ) G £ 

for ||//|| < e~H 5 ll ■ • • II . This means that e T ■ ■ ■ e s is an interior point of £, which 
is thus open. Its complement in Q\ is also open, since G\\£ = ^ £ (prove!) 

and each T£ is open (Example 13.22(4)). £, being open and closed in Gi, must equal 
G\ (Proposition 5.3). □ 

Exercises 13.25 

1 . The invertible elements of C ,v are (zi, ■ ■ ■ , zn) suc h that none of the components 
are zero. 

2. In £°°, a sequence (a„) is invertible if, and only if, it is bounded away from 0, 
i.e., 0 < c ^ \a n \. Paths t i-»- w(t) in C[0, 1] are invertible when they do not 
pass through 0. 

3. In B(X), the invertible elements are the automorphisms of X . 

4. In B(X), || T’ -1 1| = 1/ inf 11 * 11=1 ||7*||. 

5. \n X x y, (S, T) is invertible if, and only if, both S and T are invertible. 

6. The integral operator on C[a, b], Tf(y ) := J b k(x, y)f(x) dx has norm satis- 

fying Ill’ll < Ill'll L°° \b — a |. Deduce that when H^H^oo < \/\b — a |, the equation 
T f + g = f has the unique solution / = T n g. 

1. If T is invertible and Tx — y, (T + H)(x + x e ) = y, then < |-||r||||^|| ■ 


1 This was stated, not proved, in Proposition 13.18 but the argument is not circular. 
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8. The map t i->- e ,T is a differentiable group-morphism R — > Q(X)\ its derivative 
at t is Te ,T . 

9. * Conversely, every differentiable group-morphism A : R. — > G(X), meaning 
A t + S = A,A S , is of this type: 

(a) 3/t > 0, f 0 A, dr is invertible, by the mean value theorem (Proposition 
12.9), and // +A A = (j’ 1 A)A r ; 

(b) Let T := (A/, — 1)( A) -1 , so that A t +h = A, + hT A, + o(h)\ 

(c) £(A t e ~ ,T ) = ( i,A t )e- ,T - A t Te~ ,T = 0, so A, = A 0 e ,T = e ,T . 

10. Verify Proposition 13.23 for 

11. A topological divisor of zero, also called a generalized divisor of zero, does not 
have right or left inverses. 

12. The right-shift operator R on t°° is a right divisor of zero but not a topological 
divisor of zero. 

13. In finite dimensions, there is no distinction between divisors of zero and topo- 
logical ones. (Hint: S„ e B x, which is compact.) 

14. An isomorphism between Banach algebras preserves topological divisors of zero. 

15. If R is invertible, then ||,R -1 || ^ 1 /d(R, dQ(X)). (Hint: By the definition of 
d(/M, dG(X)) (Example 2.20(9)), there is a sequence T n e 3 £/(X) such that 
\\T n -R\\^d(R,g(X)).) 



13.4 Analytic Functions 

There are two ways of connecting the coefficients of a power series to its function 


/(z) = ao + aiz + fl 2 Z 2 H , 


(i) by differentiation 


f (n \z) = n\a„ + (n + l)!a„+tz H 


/ ( ' !) (0)=77! flj 


(ii) by integration 


f(z) a 0 a n - 1 


/<z) 


dz = 27 ri a n -\. 


These formulas raise the possibility of creating a power series from a given function, 
by defining the coefficients in these ways. The latter one is more useful because it 
does not assume / to be differentiable infinitely often. 
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Theorem 13.26 Taylor series 


If /: C -»■ C is analytic in a disk 5«(0), then it is a power series inside 
the disk. For p(T) < R, 


f{T):= f(z)(z-Ty l dz = Y j a n T n , 


where 


a n = ^~.l f(z)z~ l - n dz = 1^1, wne N 
1m J n\ 

and Vr < R , 3 c r , Vn e N, \a„\ < — . 

r n 


Proof The path of integration is along a circle with center 0 and radius r just less 
than R (but larger than p(T)). For z on this circle, p(T /z) = p(T)/r < 1, so 

OO 


1 

2 jri 


(z - T)~ l = z~\ 1 - T/z)~ l = ^z~ l ~ n T n , and 

n = 0 

OO j n OO 

f(z)(z - T)~ l d z = J^—f f(z)z~ l - n dz T n = ^ n T n . 


n = 0 


n = 0 


However we need to justify the swap of the summation with the integral. Recall that 
z h- (z - T) !->■ (z - 7’) -1 is continuous in z by (13.3), and the circle is a compact 
set, so || f(z)(z — T) _1 || ^ C for z on the circle (Corollary 6.16). It follows that 

OO 

| Y, f(z)T"/z n+l \\ = || T N f(z)(z-T)~ 1 /z N+l \\ < C||7’ A '||/r A,+1 -> 0 

n=N 


uniformly in z. So § f(z)T n /z n+1 dz -* § f(z)T n /z n+l dz. 

Note that 

|a„| ^ — j) c/r n+l dt — c/r n , 

where c is the maximum value of / on the compact disk B, [0] C C. The radius of 
convergence of this power series is at least R since 

~ 1/n ^ lim = r, Vr < R. 

n — ^ oo C l ' n 


lim inf |a„| 
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To justify the use of the notation f (T), we need to show that when T is a complex 
number al, the two uses of the symbol / agree, i.e., f(a 1) = f(d) 1 ; but this is just 
Cauchy’s integral formula, f(a) = frj § /(z)/(z — a) dz. Consequently an analytic 
function is indeed a power series. □ 

Proposition 13.27 Liouville’s theorem 

If an analytic function on C grows polynomially \f(z)\ < c|z|", then f is 
a polynomial of degree at most n. In particular, if / is bounded then it is 
constant. 


Proof If / : C — > C were analytic on C, and grows polynomially, then its maximum 
value on a disk of radius r is c r f cr". So the mth Taylor coefficient vanishes for 
m > n, 


\a m \ f c r /r m ^ cr" m —*■ 0 as r — ► oo. 

This also applies to vector- valued analytic functions F: C — > X. For any func- 
tional (p e X* , ip o F : C — >Cis also analytic. If F grows polynomially, then so 
does <p o F 

\foF(z)\ < ||0||||F(z)|| < H\\c\z\ n , 

which implies that <poF(z) is a polynomial ai} + a]Z + - ■ • +a n z n ■ In fact, by Example 
12.3(3), a n = (p o F (n) (0)/n\, so that 

0 o F(z) =(po (F( 0) + F'(0)z + ■■■ + F (n) (0)z n /n!). 

As 0 is arbitrary, we deduce that F(z) is a polynomial in z. □ 

Theorem 13.28 Laurent series 


If /: C -* C is analytic in a ring Br( 0) \ B, [0], and r < p(T 1 ) 1 < 
p(T) < R, then 


/CO := — 

In i 


/(z)(z - T)- 1 dz = Y, fl " r ’ 


where a n — f f(z)z 1 " dz, V« e Z. The residue of / in B, [0] is a-\. 


The path of integration is here understood to be just within the boundary of the 
ring, going counter-clockwise around a circle of radius just smaller than R, and 
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clockwise around a circle just larger than r. Note that R is allowed to be infinite, in 
which case substitute R with any value larger than p(T). 

Proof A Laurent series can be thought of as the sum of two separate power series, 
'lL^Lo a nT n + 2^_i a~„T~ n , one in T and the other in T~ [ . If R and R' are 
the respective radii of convergence, then absolute convergence occurs only when 
p(T) < R and p(T~ x ) < R' . 

For z on the bigger circle, p(T /z) = p(T)/\z\ < 1 if the radius is close enough 
to R, so just like the proof of the Taylor series, 


1 

2jri 


f(z)(z-Ty l dz = Y J a n Tn ■ 


n = 0 


For z on the smaller circle, p(zT 1 ) = \z,\p(T 1 ) < 1 when its radius is close 
enough to r, so 


OO 

(z — T ) _1 = — (1 — z7’- 1 )“ 1 2 — 1 = -^z n T~ n -\ 

n= 0 


and (along an counter-clockwise path) 


jf f(z)(z - T)~ l dz = - X ^7 / dz T~ n 


n= 1 


Combining the two integrals and series gives Laurent’s expansion. Note that the 
second series vanishes when / is analytic within B r (0), by Cauchy’s theorem, so it 
is consistent with Taylor’s theorem. 

Since the Laurent series converges uniformly strictly within the annulus, we obtain 


1 

2jri 


1 


OO 

/(e) dz = S 

n = — oo 


a n z n dz = < 7 _i. 


□ 


These two theorems of course also apply, by translating, to disks and rings with 
center zo', the resulting series will then be ]T ); a„(T — zo) n - 

Proposition 13.29 


The zeros of a non-zero analytic function, defined on an open connected 
subset of C, are isolated. 



13.4 Analytic Functions 


303 


Proof Suppose an interior zero w of / : £2 — > C is a limit point of other zeros, 
Z n —■ * vo (. Zn 7 ^ vo). Then / can be written as a power series f(z) = Xir a k{z — w) k 
in some neighborhood of w. If ag is the first non-zero coefficient, then 


0 = f(z„ ) = ( Zn - w) K {a K +a K +i(z n - w) H ), 


0 = + QK+\(Zn - w) H -»■ OK as z„ -* w. 

This contradiction determines that / is locally zero in Q. Hence it is zero in f2 
(Exercise 5.7(9)). □ 

Examples 13.30 

1. The Fourier series X«^=-oo a n£ ln ° is a Laurent series with T — e l ° . 

2. ► For polynomials (and circular paths as in the theorems), 

P(T) = p(z)(z - T)~ l dz. 

For example, 

1 = — 1 — (f (z — T)~ l dz, T = — ' — / z(z - T)~ l dz, 

2 in J 2 jii J 



2 Tti 


-(z-rr'dz. 

z 


Proof for T 1 : We can use Laurent’s expansion on a path z(0) = re ' 9 , since 1 /z 
is analytic everywhere except at 0 , 


Cl t i 


1 

2iti 


1 

Z "+2 




1 r -i(n+l)8 

r «+l 


dd = 0 


unless n — — 1 , in which case o_ i = 1 . So a n T" = T ~ 1 . 

3. ► We can finally show e lo s( 1 + 7 ’) = 1 + 7’ for p{T) < 1. 

Proof Let /(z) := e ] °g( ] +z) f 0 r |z| < 1; then f'{z) = e log ^ 1+z V(l + ^) an d 
/"(z) = 0 (check!). So the non-zero coefficients of its Taylor series are ciq = 
/( 0) = e° = 1 and ci\ = f'( 0) = 1. Hence f(T) =1 + 7. 


4. 


Binomial theorem: (1 + T) p 
P(T ) < 1, p e C, and ( p ) := 


■- e plog(l+T) 

■ 


1 + pT + (f) T 2 + • • • provided 


n\ 
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Proof Define the analytic function /(z) := (1 + z) p = e pio ^ l+z> inside the unit 
disk Be. Its derivatives are, by induction, 

/ (n) (z) = pip — 1) ■ • • ip - n + l)e (p_n+1,log(1+::) (l + z y x 
= P(.P ~ 1) • • • ip ~ n + 1)(1 + z) p ~ n , 

so its power series coefficients are a„ — f (n Hf))/n\ = 

5. ► There are versions of these series expansions valid for a vector- valued func- 
tion F : C —r X, where X is a Banach space and F is analytic inside a ring, 
r < |z| < R. 


F(z) = j) F(w)(w - z) 1 dw = ^ A„z", 

1 

where A n := 

2ni 

Proof For any f e X*, the map f o F : C —*■ C, being the composition of dif- 
ferentiable functions, is analytic on the ring Br{ 0) \ B, [0] , so it has a Laurent 
expansion </>o F(z) =257 § ( t > ° Fiw)iw — z) _1 dm = b n z n for r < |z] < R 
and b n — <j)A„. But (j> is linear and continuous, so it can be extracted out of the 
integrals and series, 

= <p^A n z n , 

n 

and as <p is arbitrary, the result follows. 

Exercises 13.31 

1. Let T := q^’ verify directly that T = - j> z(z — 7)“' dz by calculating 

the integral in a circular path around the origin. 

2. Show that there are no analytic functions in C which grow at a fractional power 
rate \z\ m/n im/n <£ N). 

3. Show that the Laurent series for cot 7, valid for p(T) < tc, p(T~ l ) > 0, is 

,1 1,2c 

cot 7 = 7 _1 - -7 7 3 7 5 , 

3 45 945 

and find its residue at 0. (Hint: cotz = (1— z 2 /2+z 4 /24-| )/z(l — z 2 /6-| ).) 

4. If an identity between analytic functions, /(z) = giz), holds in a complex disk 
Br( 0), then it holds for any 7 with p(T) < R. 


^ (2 iti 


f o 7(z) = f I - — 7 f Fiw)(w — z) 1 dw 


Fiw)w 1 "dice X. 
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5. Justify the identity n log(l + T) — log(l + 7')", hence deduce the assertion 

lim (1 + T/n) n = e T . 

n—>o o 

6. A function on C has a pole a of order N if, and only if, it has a Laurent series 

expansion — a ) n about a; its residue is «_ i . 

7. * Two analytic functions on an open connected subset of C must be identically 
equal if they are equal on an interior disk. (Consider the interior of the set for 
which / = g.) 

8. Suppose / is analytic on the extended complex plane, except for isolated points, 

i.e., /(1/z) is also analytic at 0. 

(a) Show that / has a finite number of zeros and poles (except for / = 0), 

(b) Using polynomials p. q whose roots are these zeros and poles, respectively, 
deduce that / is a rational function p /q . 

Remarks 13.32 

1 . A subalgebra must have the same unity as the algebra — it is not enough that it 
has a unity. For example, C (Exercise 13.10(2)) contains the set { (0, a) : a e C } 
which is closed under addition and multiplication and has its own unity (0, 1), 
different from C’s unity (1 , 0); it is an algebra, but not a subalgebra of C. Instead, 
the set { (a, 0) : a e C } is a subalgebra of C. 

2. The axiom <J>1 = 1 of an algebra morphism does not follow from the other 
properties of <t>. For example, the map <t> : C — > C defined by <t>(z) := 
(0, z ) satisfies all the properties of a Banach algebra morphism, except that 
<J>(1) = (0, 1) ^ (1, 0). But continuity of characters follows from their other 
properties (Proposition 14.34). 

3. * The proof of the embedding of X into B(X) does not make essential use of the 
axiom || 1 1| = l,orof||ax|| ^ ||a|| ||x||. If instead, ||1|| =cand||ax|| ^ c'\\a\\ ||x||, 
one gets 


ll^ll — l|£ fl l|| ^ c||L a ||, ||L fl || ^ c ll^ll- 

Thus X has an equivalent norm defined by |||a||| := ||L fl ||, with |||1||| = ||/|| = 1 
and 

\\\xy\\\ = \\L xy \\ = \\L x L y \\ < ||LJ||L V || = |||. v I |||y HI . 

i 

4. In the Banach algebra B(X), one can define p x (T) := limsup (J ||7' n ;c||«; so 
0 ^ PxiJ) ^ p(T). The series a n T n x converges absolutely when p x (T) is 
less than the radius of convergence. 


Chapter 14 

Spectral Theory 


A moment’s reflection shows that, by Cauchy’s residue theorem, the path of 
integration in f(T) = f(z)(z — dz can be modified, as long as / and 

(z — T) -1 remain analytic over the swept region. We are thus led to study the region 
where z — T is not invertible, called the spectrum of T . 

Definition 14.1 

The spectrum of an element T in a Banach algebra is defined as the set 
cr(T) {X e C : T — X is not invertible}. 

Its complement C \ er(T’) is called the resolventof T . 


Examples 14.2 


1 . 

2 . 


a(z) = {z} (since z — X is not invertible when X = z). 


► Recall that a square matrix A is non-invertible <0- A is not 1-1 <£> det A = 0. 
The spectrum of an n x u matrix consists of its eigenvalues, i.e., the roots of the 
characteristic polynomial equation det (7’ — 7.) = 0 of degree n. 


For example, the spectra of the 2x2 matrices 


COPCO- 


and 



, are {0}, {0}, {—1, 1}, and {a, b } respectively. 


Note that it is possible to have different elements with the same spectrum. The 
spectrum is a sort of ‘shadow’ of T — it yields important information about T , 
but need not identify it. 


3. ► The spectrum of a sequence x = ( a„ ) e £°° is ct(jc) = imjc = {a„ : n e N}. 


Proof The inverse of the sequence x — X — (a n — X) is bounded iff \a„ — X\ f c > 0 
for all n, hence X ^ cr{x) O A. is an exterior point of {a n }. 


J. Muscat, Functional Analysis, DOI: 10. 1007/978-3-3 19-06728-5_14, 
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4. A spectral value of an operator T e B(X) is a complex number X for which the 
equation (T — X)x = y is not well-posed; one sometimes sees in practice that as 
one varies a parameter A of a model, some specific values have unstable solutions 
that ‘resonate’. 

5. (a) ► Translations, ‘rotations’ (in the sense of multiplication by e ' e ) and scaling 

of T have corresponding actions on its spectrum: 

v(T + z) = cr(T) + z, a(zT) = zo(T ), 

since (T + z) — X = T — (A — z), so X e a(T + z) O X — z e cr(T); for 
z # 0, {zT) — X = z{T — X/z), so X e a(zT) <» X/z e a(T). 

(b) If T is invertible, then a(T~ l ) — a(T)~ l := {X -1 : X e <j(T)}, since 

T~ l — X = -XT~\T - X -1 ), so X e (r(T~ l ) X~ l e a(T) (note that 
a ^ 0). 

(c) The matrices S := (Si) and T := (!!) show that there is no simple 
relation between a(S + T) or cr(ST) and a(S) and er(T ) in general. 

(d) a (ST) = a(T S) U {0} OR cr(ST) = a(TS) \ {0}. 

Proof For X ^ 0 and ST — X invertible, (TS — X)~ l = j(T(ST — X)~ l S—l), 
since 

(: TS - X)(T(ST - X)~ l S - 1) = T(ST - X)(ST - X)~ l S - ( TS -X) = X, 
(T(ST — X) -1 S - 1 )(TS-X) = T (ST -X)~ l (ST - X)S -(TS - X) = X. 

Thus, a (T S) c a (ST) U {0}; indeed, reversing the roles of S and T shows 
CT(rS)U{0} = <j(ST) U {0}. 

(e) In particular, a(S~ i TS) = a(T). 

Example: Quadratic Forms 

Extracting the spectrum of matrices is one of the most useful application of 
mathematics. Quadratic forms are expressions of degree 2 in a number of variables, 
such as 


/ ° d / 2 f/ 2 \ / x \ 

q(x, y, z) — ax 2 +by 2 +cz 2 +dxy+eyz + fzx = (x y z) I d/2 b e/2 ) j v J . 

V//2 e/2 c )\z) 

They are found in the equations of conics and quadrics, the fundamental forms 
of surface geometry, the inertia tensor and stress tensor of mechanics, the integral 
forms of number theory, the covariances of statistics, etc. They can always be written 
as q(x) = x t Ajc, with A a symmetric matrix. We will see later that when the 
coefficients are real, such matrices have real eigenvalues, X\, ... ,X^, and there 
exists an orthogonal matrix P such that P~ l AP = D, where D consists solely of the 
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eigenvalues on the main diagonal. So the orthogonal transformation* 5c := P 'x 
gives a new simplified quadratic form 

q(x) = x 1 Ax = x T P T AP5c — 5c J D5c = X\5c 2 H + Xn5c^ =: q(5c). 

These eigenvalues are intrinsic to the quadratic form, in the sense that any rotation 
of the variables gives a quadratic form with the same spectrum, and so represent real 
information about it rather than about the choice of variables. Not surprisingly these 
values were discovered before the connection with linear algebra became clear, and 
called by a variety of names such as “principal curvatures”, “principal moments”, 
“principal component variances”, etc., in the different contexts. For example, a conic 
that satisfies the equation ax 2 + bxy + cy 2 = 1 can also be represented by the 
equation Xx 2 + fiy 2 = 1, where (5c, y) are obtained by a rotation/reflection of (x, y). 
Hence there are four types, depending on the signs of X, p: ellipses, hyperbolas, 
parallel lines, or the empty set. 

14.1 The Spectral Radius 

Determining the exact spectral values of an element is usually a non-trivial problem. 
The fundamental theorem for the general case is: 

Theorem 14.3 

The spectrum of T is a non-empty compact subset of C. The largest extent 
of o(T), called the spectral radius of T , is 


max{|A| : X e <t(T)} = p(T) = lim \\T n ||" 


Proof a(T) is compact : If |A.| > p(T), then p(T /X) = p(T)/\X\ < 1, so T — X — 
— A(1 — T /X) is invertible (Theorem 13.20). Spectral values are therefore bounded 
by p(T). 

The resolvent set is none other than f~ l Q(X) where f(z) := T — z, and Q(X) 
is the set of invertible elements of X. Since Q(X) is open in X and / is con- 
tinuous, it follows that the resolvent is open (Theorem 3.7), and the spectrum is 
closed in C. More concretely, if T — X is invertible, and - is close enough to X, then 
|z — A. | = || (T — z) — (T — A.) || implies that T — z is also invertible (Theoreml3. 21). 

The spectrum o(T), being a closed and bounded set in C, is compact (Corollary 
6.20). 

o(T) is non-empty. Applying Theorem 13.26, with f(z) '.= 1, and a circular path 
centered at the origin with radius larger than p(T), gives 
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But the map z h> (z — T) 1 is analytic on C\<j(T) by (13.3). This would contradict 
Cauchy’s theorem (Theorem 12.16) were the spectrum empty. 

The spectral radius is p(T): Let r„ be the largest extent of o (T), and consider 
the function / : z i-> (z — T) _1 ; it is analytic on C\ct(T), in particular on C\B,- a [0]. 
So it has a Laurent series A„z”, valid for all z > r a (Example 13.30(5)). On 
the other hand, we know that 

j 00 j'n 

{z _ T)-' = (1 - T/z)- 1 = Z 7T+E f0 ^l > P^- 

c n = 0 4 

The two series must be identical, X«t=-oo A n Z n = /~" +1 > and remain valid 

for all | z | > ro-.ButthesecondseriesdivergeswhenpfT) > liminf,, |z _ "| _1// " = |z| 
by the Cauchy-Hadamard theorem, so there can be no z e C such that r n < \z\ < 
p(T), in other words, r a — p{T). □ 

This is a surprising result: one might expect p(T) to depend on the specific norm 
used for a square matrix T, but the spectrum of T consists of its eigenvalues, which 
are determined by an algebraic equation. 

Corollary 14.4 Fundamental Theorem of Algebra 
Every non-constant polynomial in C has a root. 

Proof The roots of the polynomial equation z” + a„_iz" -1 + ■ • • + «o = 0 are 
precisely the spectral values of the matrix 

/ 0 0 -a 0 \ 

1 0 0 — ai 

o'. 

.0 

^ 0 — 0 1 —a „- 1 ) 


□ 


Examples 14.5 

1. The smallest extent of o(T ) is p ( 7’ ~ 1 ) 1 when T is invertible (otherwise it is 0). 
Thus the condition r < p(T~ l )~ l f p(T) < R for a Laurent series expansion 
to exist (Theorem 13.28) can be restated as “the spectrum of T lies inside the ring 
with radii r and R". 

2. ► Every Banach division algebra is isomorphic to C (Gelfand-Mazur theorem). 

Proof A division algebra is defined as one in which the only non-invertible element 
is 0. Hence T — X is not invertible precisely when T = X e Cl. But o(T) is 
non-empty, so this must be the case for some X. 
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3. ► Every Banach algebra, except C, has non-zero topological divisors of zero. 

Proof Suppose that the only topological divisor of zero is 0. Since the spectrum 
a (T) of every T has a non-empty boundary (Proposition 5.3), there is a T — X 
which is a topological divisor of zero, so T = X e Cl. 

4. ► Every commutative Banach algebra, except C, has non-trivial ideals. 

Proof Suppose the only ideals are {0} and X. Then the ideal generated by T f 0, 
namely XT (in a commutative algebra), must equal X. It follows that ST = 1 
for some S e X, and T is invertible. But the only Banach division algebra is C. 

5. A morphism / : X — > y may only decrease the spectrum of an element, since 
a non-invertible element in X may become invertible in y, but an invertible in 
X cannot become non-invertible in y. If J is an embedding, the boundary of 
the spectrum in X, consisting of topological divisors of zero, is preserved in 
y (Exercise 13.25(14)). The spectrum may decrease but its boundary (and the 
spectral radius) does not. 

6. Recall the commutant algebra y := A" C X when the elements of A commute 
(Exercise 13.10(14)). By part (c) of that exercise, for any T e y, if T — X is 
invertible in X then its inverse is in y, so oy(T) = a(T). 

Little else can be said about spectra of general elements of an algebra. The fol- 
lowing proposition shows that the spectrum o(T) depends somewhat ‘continuously’ 
on T : 

Proposition 14.6 


If T n 


T , then 


Ve > 0, 31V, N =$ a(T n ) c a(T) + B e (0). 



U 


while on the remaining closed and bounded set \ U, the continuous function 


z i—* || (T — z) *|| is bounded (Corollary 6.16). If ||T — S|| < I, then when z f. U, 
II (T - z)~ l (T - 5)|| < 1. This implies that 

S — z = (T — z) — (T — S) = (T — z)(l — (T — z)~\T - S)) 
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is invertible (Theorem 13.20). Thus cr(S) C (/, and we have shown that any open 

set that contains a(T) also contains o (S) for S close enough to T . 

For example, if U := a (T) + B e (0) and T n is close enough to T, then 

a (T n ) C U. □ 

Exercises 14.7 

1. The spectrum of (zi, . . . , zn) e C N is {z\, ... , zn}- 

2. The spectrum of / e C[0, 1] is a (/) = im(/). 

3. Verify directly that for a matrix A with eigenvalue X, A — X is a divisor of zero. 

4. * Prove that cj(T 2 ) = a(T) 2 — {X 2 : X e a (T)}. (We will see later a broad 
generalization of this (Theorem 14.25)). 

5. Show that a(LR ) = {1}, but o(RL) = {0, 1}, where L and R are the shift 
operators. 

6. Show that ST — T S = z ^ 0 for S, T e X implies a (ST) is unbounded, which 
is impossible (Hint: X e a (ST ) =>• X + z e a (ST)). 

7. The spectrum of (S, T) e X x y is cr(S) U o(T). 

8. If T e B(X) and S e B(Y), let T O S : X x Y X x Y be defined by 
T Q S(x, y) := (Tx, Sy). Then a(T © S) = <r(T) U a (S). 

9. If X is a boundary point of the spectrum, then T — X is at the boundary of Q (X), 
and so is a topological divisor of zero (Proposition 13.23). Moreover, if T — /r is 
invertible, then 

||(7’- M r 1 || ^ l/d(/M, a(T)). 


14.2 The Spectrum of an Operator 


An operator T on a Banach space X is invertible in B(X) when T has a continuous 
and linear inverse T~ l e B(X). By the open mapping theorem, this is automatically 
true once T is bijective. So an operator T e B(X) is not invertible when one of the 
following cases holds: 


T not invertible in B(X) 


T not 1-1 


T is 1-1 but not onto 


im T = A' 


imT ^ X 
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• T is not 1-1 (i.e., ker T ^ 0). In this case, T is a left divisor of zero as TS — 0 
for any non-zero S e B(X) with im S C ker T . 

• T is 1-1, but not onto, yet it is “almost” onto, in the sense that its image is dense, 
im T — X. Here, it cannot be the case that j|7’x| ^ c||x|| for all x and some c > 0, 
otherwise im T would be closed (Example 8.13(3)) and T onto. This means that 
one can decrease || Tx || but keep || jc || fixed, i.e., there are unit vectors x n such that 
T x n — »• 0. By taking any unit operators with im S n — |[.r„]], we get T S n — > 0, so 
T is a topological left divisor of zero. 

• T is 1-1, and its image is not even dense in X. In this case, by Proposition 11.18, 
there is a non-zero S e B(X) with kernel containing im T, so ST = 0, and T is 
a right divisor of zero. 

The spectrum of an operator T e B(X) thus consists of A in: 

• the point spectrum a p {T), when T — X is not 1-1, i.e., Tx = Tx for some x ^ 0; 
we say that X is an eigenvalue and x an eigenvector of X (note that a non-zero 
multiple of an eigenvector is another eigenvector, so they are often taken to be of 
unit length); the subspace ker(7’ — X) of eigenvectors of X (together with the zero 
vector) is called its eigenspace. 

• the continuous spectrum u c (T), when T — X is 1-1 , not onto, but im(7' — X) = X. 

• the residual spectrum cr r (T), when T — X is 1-1, and im(7’ — X) ^ X. 

Proposition 14.8 

Eigenvectors of distinct eigenvalues are linearly independent. 

Proof Let Vj f 0 be eigenvectors associated with the distinct eigenvalues X,, 

i = 1,2,.. ., so that ( T — X)vj = (X, — X)v,. The sum = 0 implies 



• • • = ai(Ai — Xf) ■ ■ ■ (Aj — Ajv)ui 


forcing a\ = 0. Since the argument can be repeated for any other index i, we have 

OLi = 0 . □ 
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Proposition 14.9 


If X is a limit of eigenvalues, or is in <j c (T), or is a boundary point of a (T) 
, then X is an approximate eigenvalue, meaning there are unit vectors x„ , 
such that 

( T — X)x„ —*■ 0 as n — »■ oo. 


Proof If X n — > X and T x n = X n x n with ||jt„|| = 1, then 


( T - X)x n = (X„ - X)x„ 0. 

X is an approximate eigenvalue exactly when T — X is a topological left divisor 
of zero, because suppose there are unit operators S n with (T — X)S„ — > 0. Let x„ 
be vectors such that || .S',, || = 1 and \\x„ || ^ 2 (possible since ||S„|| = 1); then 
(T — X)S n x„ -» 0, and X is an approximate eigenvalue. 

Conversely, given ( T — X)x n —> 0 with x„ unit vectors, let S„ := x n <p for any 
(p e X* with unit norm. Then ||S„|| = 1 and (T — X)S„ = (T — X)x n (f> — > 0 as 
n —>■ oo. 

This includes the case when X is at the boundary of <j(T) (Proposition 13.23), 
and when X e o c (T) as we have just seen at the beginning of this section. □ 

Examples 14.10 

1. ► The spectrum of the left-shift operator L{a n ) := (a n+ \ ), on l°° is the unit 
closed ball. 

Proof The norm of L is 1, so a(L) C If (()]. To find its eigenvalues, we need to 
solve Lx = Xx for some non-zero x = (a n ) e l°°, i.e., 

Vn,fl „ + 1 = Xa n , \a n \ ^ c. 

This recurrence relation gives a n = X”ao, satisfying |ao||^|" = \a n \ ^ c. Thus 
the only possible candidates for eigenvalues are |L| 1. In fact, for any such X, 

the sequence (1, X, X 2 , . . .) is an eigenvector in C°°. Hence cs(L) = If [0], and 
all spectral points are eigenvalues. 

2. ► The spectrum of the left-shift operator on i 1 is the unit closed ball. 

Proof The same analysis as in Example 1 applies: p(L) f ||L|| = 1, and a n = 
X"ao. This time, the condition x e l 1 is \a n \ = |«q I y, „ \X\ n < oo. This 
is only possible when /, < 1. Once again, but only for /, < 1, the sequence 
(1, A., X 2 , . . .) is an eigenvector in l 1 . Still, since it is closed, bounded by 1, and 
contains B i(0), the spectrum must be the closed disk. The spectral values in 
the interior are eigenvalues, and those on the circular perimeter are approximate 
eigenvalues. 
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3. Let T : l 2 -+ £ 2 be the multiplier operator T{a n ) := ( b n a n ) where b n are 
bounded. Its eigenvalues are b n , and its spectrum is K := {b \ , Z? 2 , ■ • ■}• 

Proof For eigenvalues, T(a„) = ( b n a n ) = X(a n ), so (b n — X)a n = 0 for all n. 
This implies X = b„ for some n, otherwise (a n ) = 0. In fact, T e n = b n e„, so b n 
is indeed an eigenvalue. Now, suppose X is not a limit point of {b\ , bj, ■ ■ ■}; there 
is then a minimum positive distance between X and K, i.e., \X — b n \ f d > 0. 
So the equation (T — X)(a„ ) = (c„) can be inverted, a„ = c n /(b n — X), with 
| ^ \c n \/d\ ||(7" — A.) - 1 1| ^ 1/d. The spectrum therefore must include the 
eigenvalues and their limit points, but nothing else. 

4. Let T : T°°[0, 1] L°°[0, 1] be defined by Tf(x) := j\'_ x f(s ) ds. Then T is 

linear, and continuous with || T | ^ 1 since 


\\Tf lit- 


sup 

*e[0,l] 



f(s)ds 




II L°° SU P 

jce[0, 1] 



ds= ll/llioc. 


rl 

For eigenvalues, we need to solve J, f(t)dt = Xf(t). Differentiating twice 
gives f"(x ) + = 0 with boundary conditions /( 0) = 0 = /'( 1). Thus 

the eigenvectors (or “eigenfunctions”) are f(x ) = sin(.r/l) with eigenvalues 
X = 2 /kn, k odd. The spectrum must also include 0, because it is their limit 
point, but at this stage we cannot conclude anything further about the spectrum. 

5. If S : X — > Y, T : Y — > X are operators, then ST and T S share the same 
non-zero eigenvalues. 

Proof If ST x = ~kx (x f 0), then TS(Tx) = T(ST)x = X(Tx), so either 
Tx = 0, in which case X = 0, or Tx is an eigenvector of TS with the same 
eigenvalue X; similarly, every non-zero eigenvalue of TS is also an eigenvalue of 
ST. (Compare with Example 14.2(5d).) 

6. Gershgorin’s theorem: If T = [T (/ ] is an operator on co, then each eigenvalue 

belongs to a disk B r [ Tjj ] for some j, where r := I Tji\- 

Proof Let x = ( a ,) be an eigenvector of T and let \aj be its largest coefficient. 
Then rearranging T x = Xx we get 


Xcij = ^ Tjjdi = Tjjcij + ^ Tji 


'W 


I 1- T jj\\ a j\ < y' J \ T ji\\ a i\ < r \ a j\- 


7. Real eigenvalues of real operators have real eigenvectors, i.e., if X is a real Banach 
space, then T e B(X) is not guaranteed to have a spectral element, but it will have 
when considered as an operator on the complex space X + iX. Nevertheless if 
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the eigenvalue is real, with eigenvector u + iv, then u and v are also eigenvectors 
(unless 0), 

T (u + iv) — 1(m + iv) =>■ Tu — Xu, Tv = Xv. 


The Spectrum of the Adjoint 

There is a relation between the eigenvalues of T J and the residual spectrum of T : 

Proposition 14.11 

o{T j ) = g{ T ) 

O r(T) c o p (T t ) c a p (T)Ucr r (T). 

Oc(T T ) c o c (T) 

Proof (i) T — 1 is invertible in B(X), if and only if, its adjoint is invertible (Exercise 
11.32(7)), 

(t t -xy 1 = (T -xy n , 

Sol £ a(T) & X £ a (T T ). 

(ii) By dehnition, 1 e <J p (T T ) when there is a 0 ^ 0 in X* such that 
(po (T -k) = (T J - 1)0 = 0. 


This implies there is an x e X, fx ^ 0, so that x im(T — 1). In turn, if x e 
X \ i m ( T — 1) exists, then there is a f f 0 such that 0(7’ — 1) = 0 (Proposition 
11.18), and we have proved 

1 e o p (T T ) im(7’ - 1) / X. 

This condition is certainly satisfied when 1 is a residual spectral value of a(T), but 
not when it is in the continuous spectrum of T, so 

1 e o r (T) =► 1 e ct p (T t ) =► 1 ^ cr c (T). 


(iii) When A 1 is 1-1 but im A T = X*, then we can infer, by Proposition 11.30, 
that (a) (ker A)- 1 3 imA T = X * , so A is 1-1; and (b) (im A)- 1 = ker A T = 0, so 
im A — X. Applying this to A := T — 1 when 1 e cr c (T r ), we find that T — 1 is 
1-1 and has a dense image, that is, 1 e o c (T). □ 

Examples 14.12 

1. When T' T = T (e.g. on a Hilbert space) then oy(7’ T ) C a p {T) as well as 
<r c (T T ) = a c (T). 
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2. In co (or l 2 ), the left-shift and right-shift operators have 

<r p (L) = B[(0), o,\L) — 0, a c (L) = S l , 
o p {R) = Z, a r {R) = 5i(0), a c (R) = S 1 . 

Proof That <r p (L T ) = 0 has already been shown since L 1 is the right shift on 
f 1 ; in the same way can be proved <r p ( L) = B i(0). Applying this proposition, 
we find that a,\L) C a p (L J ) — 0, leaving a c (L ) = S 1 . 

Similarly for R , o r {R) C o p (R T ) C o r (R) since a p {R) = 0 (prove!), hence 
a r (R) = o p (R T ) = Bi(0) and a c (R) = S l . 

Exercises 14.13 

1. Show that the right-shift operator R (on £°° or i 1 ) has no eigenvalues. 

2. The right-shift operator R e B(V) and its adjoint L e B{1°°) have spectra 


a(L) = o p (L) = B\ [0] = a r {R) = a(R). 


3. The spectrum of L on l 1 (Z) is the circle .S' 1 . This is an example of the hollowing 
out of a spectrum when the algebra increases, in this case when l 1 is embedded 

inf 1 ®- 

4. The operator r(«o, «i, .. .) := (ao, 0, c/| , uj, ■ ■ ■), on cq, has a single eigenvalue 
1, but its adjoint has o p (T T ) = B\(0) U {1}. Deduce that o p (T) = {1 },o r (T) = 
Bi(0), and <r c (T) = S l \{1}. 

But the same operator restricted to l 1 has a single eigenvalue 1 and no continuous 
spectrum. 

5. The operator T(ao, ai, .. .) := (ciq, 0, ai , 02 / 2 , a^/3, . . .), on co, has a single 
eigenvalue 1, and its adjoint has two eigenvalues, 1 and 0. 

6. The spectrum of the multiplier operator Tx ax, on l 2 , has no residual spec- 
trum. 

7. The spectrum of xf e B(X), where x e X and </> e X*, consists of the 
eigenvalues <px and 0 (unless X is 1 -dimensional). 

8. Let T : X -* Y, S : Y — > X be operators and consider R e B(X x Y) defined by 
R(x, y) := ( Sy , Tx)', the ‘matrix’ form of R looks like (to) . Then non-zero 
eigenvalues of R come in pairs ±k. (Hint: consider (x, — V ) - ) 

9. Let T : C[0, 1] —*■ C[0, 1] be defined by Tf(x) := xf(x). Show that T is linear 
and continuous, find its norm and show that its spectrum is the line [0, 1] in C, 
consisting of only the residual part. 

More generally the spectrum of Tf := gf in C[0, 1], where g e C[0, 1], is 
im g. 
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The reader is encouraged to explore the spectrum of this operator in other spaces, 
such as Z, 1 [0, 1] or L 2 [ 0, 1]. 

10. * Let V : C[0, 1] -> C[0, 1] be the Volterra operator V f(x') := J ( * f . Show 
that 

V n+l f{x) = - [\x-y) n f(y)dy, 

n\ Jo 

and that || V" j ^ 1/n!. Deduce, using the spectral radius formula, that its spec- 
trum is just {0}. Show that 0 is not an eigenvalue (hint: differentiate!) but a 
residual boundary spectral value. 

11. Find the eigenvalues of T f{x) := f Q l x 2 y 2 f(v) dy on C[0, 1], 

12. The spectrum of an isometry T lies in Z?i[0]. Any eigenvalues or approximate 
eigenvalues lie in e' R . If T is an invertible isometry, then a(T) C e' R , otherwise 
the spectrum must be the whole closed unit disk (e.g. the right-shift operator). 
(Hint: T — X = T(\ — XS).) 

13. Show that the set {T e B(X) : T isl — 1 and has a closed image} is open in 
B(X). (Hint: Proposition 11.3.) 

14.3 Spectra of Compact Operators 

Ascents and Descents 

For any operator, the eigenspace associated with an eigenvalue X is l<er(7’ — /.). 

But this is not the whole story: for example, T := ^ has just one eigenvalue, 

and a one - dimensional eigenspace generated by (q) ; the vector v := (, j is mapped 
by T to (q), and only a second application of T kills it off. We can think of it as 
a “generalized” eigenvector, with (T — X) 2 v = 0 . In general, one can consider the 
spaces of vectors that vanish when (T — X) n is applied to them. Two nested sequences 
of spaces can be formed (here shown for X — 0), 

• an ascending sequence 

0 c ker T C ker T 2 C . . . C ker T" C . . . C (J ker T n , 


• a descending sequence 

I 3 im 7 3 im f 2 3 ■ ■ O im T" 3 ■ ■ ■ 3 P) im T n . 

n 
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Finite Ascents and Descents 

Suppose there is an n such that ker T" = ker T" +1 , i.e., for all x, 

T n x = 0 T n+l x = 0. 

Substituting T x instead of x gives 

T n+ \x = 0 T" +2 x = 0 

and ker T n+2 = ker T n+] = ker T" . By induction, all the subsequent spaces in the 
ascending sequence are identical, ker T n+k = ker T n . Operators with this property 
are said to have a. finite ascent up to n , 0 C ker T C • • • C ker 7’". 

Similarly, if im T m = im T m+] then for any xeim T m+ 1 , 

x = T m+1 y = T (T m y) = T(J m+l z ) = T m+2 z e im T m+2 . 

By induction, im T m+k = im T m . Operators with this property are said to have a 
finite descent down to m . 

Proposition 14.14 
An operator T has 

(i) finite ascent up to at most n im 7’" n ker T k = 0, V&, 

(ii) finite descent down to at most m X = ker T m + im T k , dk. 

(iii) finite ascent up to n and descent down to rn => m — n and 

X = kerT" © im T n . 

Proof (i) If im T n fl ker T = 0, then T" +l x — 0 =>■ T n x e im T n n ker T — 0, 
and T has finite ascent up to at most n. 

For the converse, let x e im T n fl ker T k , that is, x = T n y and T k x — 0. Then 
T n+k y — 0 and y e ker T n+k = ker T n \ so x = T n y = 0. 

(ii) Let x e X, then T m x = T' n+1 y = ■ ■ ■ = T m+k z, assuming finite descent to m. 
So T m (x - T k z) = 0 and x = T k z + (x - T k z) G im T k + ker T m . 

Conversely, if A = im T + ker T m , then for any x = T y + z, we have T m x — T m+1 y 
and im T m = im 7’"' +l . 

(iii) Suppose im T n = im T " + 1 , but ker T n C ker 7’" +l . Then there is an x\ such 
that r ,+1 xi = 0 but 


0 / T n x\ = T n+l x 2 = T n+2 x 3 = ■■■ 
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so Xk e ker T n+k \ ker T" +k 1 , and T has an infinite ascent. This shows that a finite 
ascent cannot be longer than the descent. 

Next suppose the ascent goes up to ker T" — ker 7’" +l but the descent goes down to 
im T m — im T m+1 with m f n. Then for any i e K, there is a v such that 

T m x = T 111+1 y => T m (x -Ty) = 0 

=>• x-Ty e ker T m = ker T n 
=>• T n x = T n+1 y 

so a finite descent cannot be longer than the ascent. 

Combining the results of (i) and (ii) gives X = ker T n © im T" . □ 

Proposition 14.15 (Fredholm Alternative) 


A Fredholm operator T with 

(i) finite ascent, satisfies index (T) f 0, 

(ii) escent, satisfiesindex(T) ^ 0, 

(iii) scent and descent, satisfiesindex(T) = 0 and 

T is 1-1 O T is onto. 


Proof Recall that the codimension of a closed subspace Y C X is defined as 
i\\m(X / Y), that Fredholm operators have finite-dimensional kernels and finite codi- 
mensional images, and index (7’) = dim ker T — codim im T (Definition 11.12). For 
T with finite ascent to n , by the index theorem, 

0 sC codim im T k = dim ker T k - indexC/'^ ) 

= dimker T n — Arindex)? 1 ), for kf n. 

Since k can be arbitrarily large, it must be the case that index (7’) ^ 0. 

For Fredholm operators with finite descent to in , 

0 < dimker T k = codim im T k + indexfT*) 

= codim im T m + k index(7’) for kfmi. 

This time, we must have index(7’) fs 0. 

A special case is when m = n = 0, known as the Fredholm alternative: ker T — 0 
if, and only if, im I — X, i.e., T is 1-1 T is onto; in other words, T is either 
invertible or it is neither 1-1 nor onto. □ 
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Ivar Fredholm (1866-1927) studied p.d.e.s under Mittag-Leffler 
in 1893 at the new University of Stockholm; he saw the con- 
nection between Volterra’s equation and potential theory, espe- 
cially in 1899 while working on Dirichlet’s problem; in 1903 
he analyzed the theory of general integral equations f(x ) — 
A f k(x, y)f(y) dy = g(x) covering much that was then known 
about boundary value problems (mostly self-adjoint), proved 
the Fredholm alternative and defined the Fredholm determi- 
nant det(l — K ) = e~ 23„ r, tlK . He was then ‘distracted’ by 
actuarial science and government. 


Fig. 14.1 Fredholm 


Examples 14.16 

1. The spaces M := im T'" and N := ker T" are both '/’-invariant and such that 
T\m is an isomorphism while 7’|,y is nilpotent. 

2. For matrices, the Fredholm alternative boils down to the statement that either 
Ax = b has a unique solution or Ax — 0 has non-trivial solutions. 

3. The Fredholm alternative only applies to (Fredholm) operators with finite ascent 
and descent; e.g. the right-shift operator is 1-1 but not onto. 

4. If T is Fredholm with finite ascent and descent, then dim ker T — dim ker T 1 
(Exercise 11.32(10)). 


The Spectrum of a Compact Operator 

The following two results are peaks in the landscape of Operator Theory. 

Proposition 14.17 

Let T ; X -*■ X be compact on a Banach space X , then I — T is a Fredholm 
operator with finite ascent and descent. 


Proof I — T is Fredholm since it is invertible up to the compact operator T 
(Proposition 11.14). 

Suppose S := I — T has infinite ascent, so ker S" ~ 1 C ker S n . By Riesz’s lemma 
(Proposition 8.20), choose unit vectors x n e ker S" with \\x n + ker .S'"' 1 1| ^ \ . Then 
for m < n. 


1 

\\TXn T x ln || — 1 1 (x n x m ) S(x n X nl ) | — 

since S n ~ 1 (x m + S(x„ — x m )) = 0. So (T x n ) has no Cauchy subsequence, contra- 
dicting the compactness of T . 
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Suppose S has infinite descent, with im S" 1 D im S " . One can choose unit 
vectors x n e im S" with \\x n + im S' !+1 1| ^ j. Then for m > «, 

1 

II T x n T x m || — | (x n x m ) S (x n x m ) j 3s _ 

since x m + S(x„ — x m ) e im S" +l . Again this would contradict the hypothesis. 

It follows from the propositions above, that the index of T vanishes and 
dimker(5 T ) = dimkerS. □ 

Theorem 14.18 Riesz-Schauder 


If T e B(X) is compact, then 

(i) its spectrum a(T) is a countable set, whose only possible limit point 
may be 0, 

(ii) each non-zero X e a(T) is an eigenvalue with a finite-dimensional 
eigenspace ker('/’ — /,), 

(iii) T J has the same non-zero eigenvalues and eigenspace dimensions as 
T. 


Proof For X 0 , T — X = X(I — T /X) is a Fredholm operator with finite ascent and 
descent, so its kernel is finite dimensional and it satisfies the Fredholm alternative, 
namely it is either invertible (X f <7(7’)) or not 1-1 (X is an eigenvalue). T — X has 
index 0, so T J has the same number of eigenvectors of X as 7’, 

dimker(7’ T — X) = dimim(7 — A.)" 1 = codim im(T — X) = dimker(r — X). 


Consider those eigenvalues X for which X p c >0. Taking any list of them, X n 
(distinct), choose a unit eigenvector e n for each, such that ||e„ + (]<J i , . . . , e n - illl ^ 
(Propositions 8.20 and 14.8). Hence, taking n > m, say, 


h/yi 1 6 

I T e n Te m \\ — II = \^n \ \\^n “ II ^ ^ ~Z ■ 

A.fi 2* 2 


Now the bounded set {ei, ej, . . .} is mapped to {T e i, Te2, . ■ .}. If the first set is 
infinite, the latter set would have no Cauchy subsequence, contradicting the com- 
pactness of T . So the number of such eigenvectors, and corresponding eigenvalues, 
is finite. The rest of the eigenvalues must be within e of 0. By taking e = 1 / n — > 0. 
it follows that the number of non-zero eigenvalues is countable. □ 

To clarify, in finite dimensions, the set of eigenvalues is finite and need not include 
0, but in infinite dimensions, 0 must be part of the spectrum (else I = T 1 T is 
compact). If there is a infinite sequence of non-zero eigenvalues, then X„ -> 0, and 
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0 is an approximate eigenvalue. What remains to complete the theory is to find the 
form of T on each generalized eigenspace. 

Proposition 14.19 Jordan Canonical Form 


On each finite-dimensional space ker(7’— k) n (A. f 0)ofa compact operator 
T on a Banach space X , there is a matrix of T consisting of blocks on the 
main diagonal, each of the type 

(k 1 0 0\ 

0 k 

/ ,0 

1 

v 0 0 A J 


Proof The operator T can be split as k + (T — /.). The latter is nilpotent on the 
subspace ker(7’ — k)" (finite dimensional since (T — k) n is Fredholm), while kl is 
diagonal. This is the claimed Jordan form, once it is shown that a nilpotent operator 
has the following form. 

A nilpotent operator on a finite-dimensional space can be represented by a matrix 
of Os except for Is and Os in the super-diagonal: Suppose A is a nilpotent operator 
of order N, A iV = 0; it has a descending sequence down to N, and an ascending 
sequence up to N, 0 C ker A C • • ■ C ker A v . For each non-zero vector A N ~ l u e 
imA iv_1 there is a sequence of vectors e\ := A N ~ l u, ej := A N ~ 2 u, ..., ext := u. 
They are linearly independent because e, e ker A' \ker A ! , so to have e m e 
leu . . . , C ker A'" -1 is impossible. Since Ae,- = e;_i and Aei = 0, the 

matrix of A restricted to the space generated by these vectors is 


/ 0. 1 0; 

0 \ 


.. 0 


. 1 

\o 

0/ 


A remains nilpotent on the rest of the space ker A N /le \ , . . . , e,y]], with perhaps 
a lower order. The same argument can be repeated to yield other sets of independent 
vectors. As X = ker A N is finite-dimensional, this process ends with a finite basis 
for X and the matrix of A with respect to it consists of such blocks placed on the 
diagonal. □ 

Examples 14.20 

1 . The total number of ks in a Jordan matrix, called its algebraic multiplicity, is the 
dimension of l<er(7’ — ).) /V , the largest generalized eigenspace. The number of 
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Jordan blocks associated with X is dim ker(T — a), called the geometric multi- 
plicity of X. The size of the largest Jordan block is sometimes called its (Jordan) 
index. For example, the matrix to the right has an eigenvalue 2 with algebraic 
multiplicity 4, geometric multiplicity 2, and index 3; the other eigenvalue 3 has 
algebraic multiplicity 2, geometric multiplicity 1, and index 2. 

I 2 \ 

2 1 

2 1 

2 

3 1 

V V 

2. The set of IV x N matrices with distinct eigenvalues is dense and open in B( C N ). 

Proof Suppose a matrix A has the Jordan-form matrix A = D + C where D is 
diagonal with the eigenvalues Xi, ... ,X r and C is nilpotent. Alter each eigenvalue 
slightly so kj are all distinct and let A! := D' + C\ then || A! — A|| = \\D' — D\\ = 
max,- — Xj | < e. 

Because of this, the Jordan canonical form of a numerical matrix is impossible to 
calculate, due to the limited accuracy of the matrix coefficients; small changes in 
the coefficients result in a diagonal Jordan matrix with distinct eigenvalues. 

Exercises 14.21 In these exercises, let K be a compact operator on a Banach 
space X. 

1. When T is 1-1, the ascending sequence of spaces are all 0. 

When T is onto, the descending sequence of spaces are all X. 

2. For the matrix CD , the ascending and descending sequences are the same. 

3. The left-shift operator L is onto and has an infinite ascending sequence; R is 
1-1 and has an infinite descending sequence. 

The operator f{x) i->- xf(x) acting on C[0, 1], is 1-1, and also has an infinite 
descending sequence, e.g. each of the functions 1, x, x 2 , . . . belongs to a different 
image space. 

4. If T has a finite descent then 7 has a finite ascent. 

5. (a) Suppose that ker T" C im T for some n. Show that Tx = 0 =>■ x = T 2 z 

so 

ker r" -1 c im T 2 . . . . , ker T c im T" . 

(b) Suppose ker T C im T" for some n, then x e ker T 2 =>• x — Ty e ker T 
for some y, so 


ker T 2 C im T n 1 , . . . , ker T n C im T. 
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6. There is an eigenvalue at the spectral radius of K, except possibly when this 


is 0. 


7. In l the multiplier map M{a n ) := ( c n a n ) is compact when c„ — > 0; its eigen- 
values are c n . 0 is part of the continuous spectrum, unless it is an eigenvalue. 
For example, take c„ := 1 / n (and co := 1), and the shift operators L and R; 
then ML is also compact but has no eigenvalues except 0; RM is compact with 
no eigenvalues at all but 0 is part of the residual spectrum. 

8. ( The original Fredholm alternative) For X ^ 0, either ( K — X)x = y has a 
unique solution for each y or K T y = Xy has a non-trivial solution. 

9. The minimal polynomial of each Jordan block is (z — X) n . 

10. Cayley-Hamilton theorem : If p is the characteristic polynomial of a matrix T, 
then p(T) = 0. (Hint: consider the characteristic polynomial of each Jordan 
block.) 

14.4 The Functional Calculus 

The previous definition of f(T) in Taylor’s theorem can be extended to functions that 
are analytic on the spectrum of T, since, by Cauchy’s theorem, the path of integration 
can be swept over analytic regions of / and (z — T)~ l . 

Definition 14.22 


For any function / : C -> C which is analytic in a neighborhood of <r(7’), let 



where the path of integration is taken along simple closed curves enclosing 
o(T) in a direction which keeps a(T) to its left. 

Note that the integral is defined since f(z ) and || (z — T) _1 || are continuous in z 
on the selected compact path; hence 
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Examples 14.23 

1. ► If TS = SR then f(T)S = Sf(R ) when / is analytic on a neighborhood of 
o(T) U o(R), since 


S(z — R) = (z— T)S 
(z — T)~ l S = S(z — R)~ l 

f{T)S = j /(z)(z - ry'Sdz = f f(z)S(z - R)- 1 dz = Sf (R). 

In particular 

(a) f(S~ l TS) = S~ l f(T)S; for example, e s ~' TS = S~ l e T S. 

(b) ST = TS implies f(T)S = Sf(T ) and f(T)g(S) = g(S)f(T). 

2. Iff e C"(CT(r))iszerooncr(r),itdoesnotfollowthat/(7’) = 0, because /(T) 
is defined in terms of a path-integral just outside cr{T). For example, T := Cl) 
has o(T ) = {0}, and f(z) := z vanishes there, yet f (T) — T ^ 0. 

3. * / is differentiable (and continuous) at T: for H sufficiently small, f(T + H) 
is defined since o{T + H) C a(T) + B f (0) (Proposition 14.6), and 

f(T + H) = f(T) + <f f(w)(w - T)~ l H(w - T)~ l dw + o(H). 
2m J 

The next theorem proves that all algebraic properties of a complex function are 
mirrored by properties of f(T). 

Theorem 14.24 The Functional Calculus 

Given T e X, the map / i->- f(T), C“(a(T)) -> X, satisfies 

(/ + g)(T) = f (T) + g{T ), (kf)(T) = kf(T), 

C fg)(T ) = / (T)g(T), lCT) = 1, 

/ og(T) = f(g(T)), 

f n -> / in C(a(T) + B € m (3e > 0) => f n (T) f(T) in 


Proof We have already seen part of this theorem in action when analyzing power 
series. In particular, the cases 1 = ^7 ~ T)~ l dz and T~ l = JU <f z~ , (z — 

T f 1 dz were covered (Example 13.30(2)). 

(i) (/ + g)(T ) = / (T) + g(T) and (Xf)(T) = A ,f(T) express the linearity 
property of the integral. 
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(ii) ( fg)(T ) = f(T)g(T): We require the identity 

(z - w)(z - T)~\w - T)~ l = (w- T)~ x - (z- T)~\ 


which follows easily from z — w = (z — T) — (w — T). In the following analysis, 
consider two paths around a (T), one (with variable z) nested inside another (with 
variable w). 


f (T)g(T) = 


1 


(2: xi) 2 

1 

{Ini) 2 

1 


f(z)g(w)(z-T ) \w-T ) ‘dzdw 


f(z)g(w) 


/ (ui-rr 1 | (z-T)- l \ 

\ Z — w W — z ) 


dz du; 


2 jti 


= - — 7 g(w)(w — T ) 1 - — 7 (p f(z){z — w) 1 dz du; 


1 


2jti 


s-l 


1 

2 jti 


HzKz - T ) 


-1 


1 

2jti 


g(w)(w — z) Mwdz 


= — f f(z)(z - T) l g(z ) dz, 
2m J 

= (fg)(T) 


where we have changed the order of integration in the third line, and used the fact 
that (w — z) _1 leaves a residue when integrated on the outer path, but not when 
integrated on the inner path (because the singularity at w would then be outside the 
path of integration). 

In particular, note that if / is invertible on a neighborhood of n(T), 

/CO -1 = j> f{z)-\z-T)- l Az. (14.1) 


(iii) f(g(T)) := ® /(z)(z — g(T)) 1 dz, where the right part of the integrand 

2 Jti J 

® (z — g(tv))~ l (w — T)~ l du> by (14.1). Combining the 

2: xi J 

two and using Cauchy’s integral formula (Proposition 12.19), we get 


is (z - g(T)) 1 = 


f(9(T)) = 

(2 Tti)- 

1 

(2 jti) 2 


f(z)(z-g(w)) 1 (w — T) 1 du; dz, 
f(z){z - g(w))~ l dz (w - T)~ l d w, 


= 2jtl f °5(w)(w - T) 1 
= / ° g(T). 


du;, 
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Note that / has to be analytic on o(g{T )) and gcr(T ) for f(g(T)) and / o g(T) 
to be defined, but the two sets are equal by the next theorem (which only uses part 
(ii) of this theorem). 

(iv) The mapping is continuous, since ||(z — T )~ 1 j is bounded by some constant c 
on the compact path enclosing the open set U cr(T) + Z? € (0): 

ll/CO -3COII < \f(z)~ g(z)\\\(z-T)- l \\ds 

^ c \\f — g\\c(u)- □ 

Theorem 14.25 Spectral Mapping Theorem 

The spectrum of f(T) is equal to the set {/(A) : X e a{T)}, that is, 

o{f(T)) = f (<j(T)) 


Proof For any / analytic in a neighborhood of a {T)\ 

(i) X fa{T ) =>■ X f a{f(T))\ Let X f(z ) for all z G cr(T); since fo(T) is 
a closed set, there is a minimum distance between X and fo{T). So (f(z) — X)~ [ 
is analytic on <r(T) + B € ( 0) if e is small enough, and by the functional calculus 
( f(T ) — X) -1 exists. Thus f(T) — X is invertible. 

(ii) f(T) — f(X ) invertible =>■ T — X invertible : if f(T) — f(X) has an inverse S, 
we see from rewriting f(z) — f(X) = (z — X)F(z), and the functional calculus, that 

(T - X)F(T)S = 1 = SF(T)(T - X) 

which implies that the factor T — X itself is invertible. This is justified once it is 
shown that F(z) is analytic about a(T)\ this is apparent when z f /,, but even so, 

f(z) = f(X) + f\X)(z — X) + ~f"(X){z - X) 1 + o(z - X) 2 , 

=► F(z) = fiz) ~ f [ M = f'{X) + \f"(X)(z -X) + o(z - X), 
z — X 2 


meaning F is analytic at □ 

Examples 14.26 

1 . log T can be defined whenever there is a path, or “branch”, connecting 0 to oo 
without meeting o(T), because in this case, log z can be defined and is analytic on 
o(T). But note that log z, and consequently log T, depends on the actual branch 
used. 

When defined, e log7 = T . Such elements must be in Gi (Proposition 13.24). 
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2. Similarly one can define T a e alogr (again not uniquely); then = T 

(n = 1,2,...), and T“ +h = T“ T h (at least fora, b real). By the spectral mapping 
theorem, p(T a ) = p(T) a for <7 ^ 0. 

3. If T satisfies a polynomial p(T) = 0, then a (T) consists of the roots of the 
minimal polynomial of T (Example 13.3(12)). 

Proof The spectral theorem shows that p{o(T)) = 0, i.e., that the spectrum 
consists of roots of p. Conversely, if A is a root of the minimal polynomial, 
p(X) = 0, then p(z) = (z - X) n q{z), so 0 = p(T) = (T - X) n q(T), where 
q(T) 0 and thus T — A is not invertible. 

4. ► If A is an eigenvalue of T e B(X) then /(A.) is an eigenvalue of f(T), with 
the same eigenvector. 

Proof When Tx = Xx, then (z — T)x = (z — X)x and (z — T)~ l x = (z — X)~ l x 
(z i <t(T)), so 



/ 


/(z)(z — T)~ x dz = 


/ (z)(z — A) x dz — f (X)x. 


2 jri 


Conversely suppose f(T) — f(X) is not 1-1. Take an open neighborhood U D 
o(T) in which / is analytic. Then, either / is constant on U , or else there are 
only a finite number of Xj e o (T) satisfying f(Xj) = f(X). So, for z e U, 
/(z) — f(X ) = (z — A i ) • • • (z — Xk)g(z) (where multiple roots are repeated) with 
g analytic and non-zero on U, and consequently 


f(T) - f(X) = (T - AO • • • (T - X k )g(T). 


But f(T) — /(A) is not 1-1, so there must be a A,- such that T — A,- is not 1-1 
(g(T) is invertible), and /(A,) = /(A). 

Proposition 14.27 

If o(T) disconnects into two closed sets o\ Um, each surrounded by simple 
closed paths in open neighborhoods of them, then 

(i) T = T P\ + T /htexthf, with Pi , P 2 (called spectral idempotents) such 
that 1 = Pi + P 2 , P Pj = Stj , 

(ii) In the reduced algebras P\XP\, P 2 X P 2 respectively, 


ct(PPi) = cti, <j(T P 2 ) — er 2 . 


Proof The disjoint closed sets <r\ and rr 2 can be separated by disjoint open sets t! \ , 
U 2 (Exercise 5.7(5)). Consider the functions Xi f = 1, 2) which take the constant 
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value 1 on one open set Ui D 07 , and 0 on the other. They are analytic on II \ U U 2 , 
so we can define 


Pi := Xi(T)=- i - i (z-T)~ l dz. 

2 TTl J ai 

The path of integration is the union of the two paths surrounding ai and n 2 , but one 
of the two integrals vanishes. 

Pi are idempotents, P\ P 2 = 0, and P\ + P 2 = 1, because y 2 — Xi> X1X2 = 0 
and Xi + X 2 = 1 on U\UU 2 D a(P). 

Let fi(z) := ZXi(z); then /;(P) = TP, and a(ft(T )) = /,(a(P)) = <T/ U { 0 } . 
However, if we restrict to the reduced algebra P, X Pi , with unity P, , this changes 
slightly. Since z — X is invertible in C"‘(oj) if, and only if, X (/-_ ay, it follows that 
there exists an S such that S(T — /.) / 3 , = P, = (T — X)SPj whenever X ^ <r, ; this 
means that (T — X) P, is invertible in P, X P, . Thus, a ( T P, ) = cr, in this algebra. □ 

Examples 14.28 

1. ► When the algebra is B(X), P, are projections, and the spectral decomposition 
of an operator T into T P\ and T P 2 also gives a decomposition of X = X 1 (£ X 2 
where X, = im P, are T -invariant, and a(T\ x t ) = a,-. (Theorem 13.8(11)) 

2. If 0 is an isolated point of a(P), with spectral idempotent P, then there is a 
Laurent expansion 

(z - T)~ l P = Pz~ l + TPz~ 2 + T 2 Pz~ 3 + • • ■ . 


3. If 0 ^ a 1 , then P| = 7 ’ ^^7 ;/j A 1 — l 1 — dzj . For example, when T is a compact 
operator and X ^ 0 is an isolated point of a(T), then the projection P, is also 
compact, confirming that the eigenspace of X is finite-dimensional. 


Exercises 14.29 


1. The non-trivial idempotents have spectrum { 0, 1 }, and the nilpotents have spec- 
trum { 0 }. What can the spectrum of a cyclic element be? 

2. If / takes the value 0 inside a(P) then f(T) is not invertible. 

3. Use the spectral mapping theorem to show that if e T = 1 then a(P) C 'Itti'L. If 
P is an idempotent, then e 2n,p — 1. 


4. If J is a Banach algebra morphism, then f(J(T)) — J (f(T)) (recall n{J (T)) C 


5 . 


Show directly that the matrix 



has no square root at all. 


The shift operators on l 2 , say, cannot have a square root because their spectrum 
encloses 0 (even on f 1 (Z) when L and R are invertible). Prove this directly by 
showing the contradictions 
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(a) ifT 2 = L,thenr must be onto and ker T = kerL = |]eo]],soeo = &T e o = 

0 ; 

(b) if T 2 = R, then T is 1—1, and im T — im R, so T Rx — RTx = 
(0, 0, vo, ■ ■ ■)■ 

6. A simple linear electronic circuit with feedback can be modeled as an operator, 
transforming an input signal x = (x n ) to an output signal y = ( y„ ) such that 

}'n = bx n - a\y n -\ a r y n -r, 

where b, a,- are parameters determined by the circuit. Equivalently, 


(1 + aiR H + a r R r )y = bx, 


where R is the right-shift operator. To avoid the once-familiar feedback loop 
instability, it is desired that the values y n do not grow of their own accord, meaning 
that 1 + a\R + ■ ■ ■ + a r R r has a continuous inverse. This is the case when the 
roots of the polynomial 1 + «iz + • • • + a r z r all have magnitude greater than 1. 


14.5 The Gelfand Transform 

Quasinilpotents and the Radical 
Definition 14.30 

The quasinilpotents are those elements Q with p(Q) = 0. The (Jacobson) 
radical J of X is 

J:={QeX :VT eX, p(TQ) = 0}. 

A Banach algebra with a trivial radical is called semi-primitive or semi-simple. 

The next proposition shows that the radical is a closed ideal, which can be factored 
out to leave a semi-simple Banach algebra. 

Examples 14.31 

1. The prime examples of quasinilpotents are the nilpotents, defined as those ele- 
ments which satisfy Q n — Oforsomen, so p(Q) ^ ||6"|| 1/,n = 0; e.g. 

2. Every operator Tf(x) := f 0 ' k(x, y)f(y) dy on C[0, 1], where k e L°°[0, l] 2 , 
is a quasinilpotent. 
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Proof \Tf(x)\ < Jo \k(x,y)\\f(y)\dy < ||£||||/||x. By induction one can con- 
clude \T n f{x)\ < \\k\\ n Wf\\x n /n\, 

\T n+1 f(x)\ = ] [ k(x,y)T"f(y)dy\ 

Jo 

< ^ \\k\\ n+l \\f\\y n /n\dy 
Jo 

< wr +l ii/iix n+ 7(« + i)! 

so II 7’" II ^ HJtir /«! and p(r) < \\T n \\ 1/n < ||it||/7n! 0. 

3. The sum and product of quasinilpotents need not be quasinilpotents, e.g. f q q ) 



4. The quasinilpotents are topological divisors of zero since their spectrum is a 
boundary point. Idempotents (except 0 and 1) are divisors of zero but not quasi- 
nilpotents. 



5. Radical elements are obviously quasinilpotents, p(Q) = p(\Q) = 0. 

6. It is enough to show that 1 f o(T Q) for all T , in order that Q e J . 

Proof For any 1^0,1 fa (T Q/X) = o {T Q) /X =>• Xfo(TQ). 

7. ► For any T eX,Q e J, o(T + Q) = o(T). 

Proof For any invertible S, the sum S + Q — 5(1 + 5 _1 Q) is also invertible, 
since p(S~ l Q) = 0 (Theorem 13.20). Thus 

X f o{T + 2) 4=> T + Q — X is invertible <£>• T — X is invertible 44- X f o{T). 

8. B(X) has nilpotents (except for X = C) but only a trivial radical. 
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Proof For any g / 0, an operator T can be found such that 1 — T Q is non- 
invertible, so 1 e o(T Q ). One such operator is T := x<p, where Qx f 0, <p e X*, 
4>Qx = 1 ; then (1 — T Q)x — x — xtpQx = 0 but x ^ 0. 

Proposition 14.32 


The radical is a closed ideal. 


Proof J is contained in every maximal left-ideal: Recall that a maximal left-ideal 
is closed and that every proper left-ideal can be enlarged to a maximal left-ideal 
(Examples 13.5(7,8)). Let Q e J , and let Ad be a maximal left-ideal. Then Ad + XQ 
is a left-ideal which contains Ad . Either 

(a) Ad + XQ = X, in which case 1 = R + T Q for some R e Ad, T e X, so that 
R = 1 — T Q is invertible, contradicting Re At (Example 13.5(5)); or else, 

(b) Ad + X Q = Ad , in which case <2 = 0+ l<2eAd. 

Thus J C Ad as required; an analogous argument shows that J is contained in 
every maximal right-ideal. 

J is the intersection of the maximal left-ideals: Let P be an element that is 
contained in every maximal left-ideal. For any T e X, the left-ideal X( 1 — TP) 
cannot be proper, otherwise it would lie inside some maximal left-ideal Ad , forcing 
P e Ad, and T P e Ad, and so 1 = TP + (1 — TP) e Ad, a contradiction. Hence 
X(1 — TP) — X, and there is an S such that 5(1 — T P) — 

To show 1 — TP is invertible we need to prove (1 — 7'P)5=las well. To this 
end one can substitute — ST for T in the above argument, to conclude that there is 
an R e X such that 

l = R(l + STP) = R(S + 1 - 5(1 - TP)) = RS. 

But RS = 1 = 5(1 — TP) implies 1 — TP = 5 -1 is invertible. With 1 f a(T P) 
for any T, P must be in the radical. 

J is a closed ideal: Being the intersection of closed sets, J is also closed (Propo- 
sition 2.18). For any S,T e X and Q, Q' e J, 

(a) p(STQ) = 0 = p(SQT), so T Q, QT e J , 

(b) a(T( Q + Q ')) = o(TQ) = { 0 } from Example 14.31(7) above {T Q' e J ), so 
Q + Q' g 3 , 

(c) P(T(XQ)) = \X\p(TQ) = 0, so XQ e J, 


and J is an ideal. 


□ 
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The State Space 
Definition 14.33 

The state space of a Banach algebra X is the set of functionals 
S(X) := {(p & X* : <pl = l = ||</>|| }. 

We often write S for S(X) and S(T) := { <pT e C : </> e S(X ) }, for example, 
«S(1) = {1}. 

Proposition 14.34 

The state space S(X) is a convex set containing the character space A(X). 
For any T e X, S(T ) is a compact convex subset of C, and 

A (T) c a(T) C S(T). 


Proof (i) S(X) and S(T) are convex : For cp, \p e S and 0 ^ t ^ 1, 

( tip + (1 — t)\p)l = t + l — 1 = 1 , and 
\\t(p + (1 - t)f\\ ^ t\\(p\\ + (1 - t)\\ip-\\ = 1. 

It follows from t(pT + (1 — t)fT = {t(p + (1 — t)fr)T e S(T) that S(T) is convex. 

S(T) is compact : S(T) is bounded since \<pT\ ^ ||r|| for any cp e S. Now recall 
that every bounded sequence in X* has a weak*-convergent subsequence (Theorem 
1 1.40 for X separable. So whenever <p n T e S(T) converges to a limit point z, there 
is a subsequence of <p n that converges in the weak* sense, <p nj —*■</>£ X* , implying 

(a) <p ni T ->■ <pT = z and 1 = tp nj 1 -* (pi, 

(b) \\<p\\ < lim inf, ||0„. || = 1 (Corollary 11.35). 

Hence (p e S and z e S(T), that is, S(T ) is closed and bounded. 

(ii) o(T) C S(T ): If S e X is not invertible, then 1 ^ HSJ; indeed d( 1, HSJ) = 1 
as [[ .S'J contains no invertible elements (Theorem 13.20). So by the Hahn-Banach 
theorem, there is a (p e X* satisfying (p 1 = 1= | r/> | and cpS — 0 (Proposition 
1 1.18). In particular, for S = T — X, where X e cr(T), there is a (p e S such that 

0 = cp(T — X) = cpT — X, 


so X = <pT e S(T). 




14.5 The Gelfand Transform 


335 


(iii) A(T) C cr(T ): Recall that any character i// e A maps invertible elements 
to invertible complex numbers (Example 13.7(1)), including i// 1 = 1. So for any 
1 f a{T), xjrT — A = i jr(T - A) ^ 0, and A f A(T). Equivalently, A(T) C cr(D 
and \xj/T\ ^ p(E) ^ || 7" || . This means that i jr is automatically continuous with 
11^11 = 1, and so A CS. □ 

Examples 14.35 

1. ► The characters of f 1 are of the type xfr(a n ) = a nZ n , where kl < 1 

depends on xfr. 

Proof Let f e A C £ ] * = £°° (Proposition 9.6); then every sequence in £ 1 can 
be written as 


oo oo 

x = (ao, a \ , . . .) = ^ a n ( e\ * ■ ■ ■ * e\ ), 

n = 0 n=0 ^ v n 


oo oo 

1 jrx = ^a„t/r(c 1 *•■■*«!) = ^ a n z n , (z := i/^i), 

n = 0 n=0 


where the multiplicative property xfr(e i * * ei) = (x/rei) n was used. The 

requirement 1 = ||i/r|| = ||(z")||foo implies kl < 1, else |z|" would grow beyond 
1 as n —> oo.. 

2. The characters of C l (h) are \[rg(a n ) = XhsZ a n£ lne ■ 

The proof is the same as above except |z| = 1, that is, z = e' d e S l for some 
0 ^ 9 < 2tt. 

3. For L l (S l ), the characters are f„(f) — / 0 2?r e lne f(0) d9, ( n e Z). 

Proof Let xf e A c Z. 1 (S 1 )* = L°°(S l ), so f{f) = / 2?r h(9)f(9) dd for 
some h e L°°(S l ). Recall that L l (A) does not contain a unity for convolution 
(Example 13.3(5)); nevertheless, one can be added artificially, so A exists and its 
characters act on L l (A). Again we require 

(a) 1 = ||t/t|| = ||/z||£oo, so \h(9)\ ^ 1 for almost all 9 ; 

(b) i /s(f *g) = f(f)ir(g), or equivalently, 


p2tt p2tt p2tt p2tt 

/ h(9) f(9-g)g(g)dgd9 = h(9)f(9)d9 h(g)g(g) dr,. 

Jo Jo Jo Jo 

This implies that h(9 + rf) = h(9)h(g) a.e.; we’ve met this identity before in 
our preliminary discussion on the exponential function in Section 13.2, where we 
concluded that h(9) = h(\) <> = e z ° , assuming h is continuous. That this can be 
taken to be the case follows from Corollary 9.22, 
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| J (h(y + e)-h(y))f(y)dy\ = \ J h(y)(f {y - e) - / 0)) dy 

< J \f(y-t)~ f(y)\dy-> 0. 

Moreover, h(2n) = h{ 0) = 1 implies that h( 1) = e‘" for some n e Z. 

4. For L 1 (R), the characters are i (/) = f R e lx ^f(x) dx, (£ e R). 

Proof Let if e A C L^R)* = L°°(R); so i js(f) = / h(x) f(x) dx. As before, 
\h(x)\ ^ 1 for all x, while the condition f(f * g) = f(f)f(g) is equivalent to 
h(x + y) = h(x)h(y) a.e., so h(x) = h(\) x . To avoid h(x) growing arbitrarily 
large as x — > ±oo, |/t(l)| must be 1, and h(x) = e' x X 

5. Repeating for L 1 (R + ), A = { e~ zx : Rez ^ 0}. 

6. * For C[0, 1], A = { 5, e C[ 0, If : <W) = fW,x e [0, 1] } = [0, 1], 

Proof That S x are functionals (with unit norm) is Example 8.6(6). In addition, 

SAfg ) = (fg)(.x) = f (x)g(x) = S x (f)S x (g), and 5,(1) = 1. 

Note that for x y, S x (f ) ^ 8 y (f ) for some / e C[0, 1]. 

For the converse, let be a character of 
C[ 0, 1]. Define ‘triangle’ functions, r n j (x), 
as in the accompanying plot; note that these 
functions overlap and sum to 1 everywhere. 

Then 1 = = X; i f ( x n,i) and at least one triangle function must give 

f(r„j n ) # 0. Infact, f(T n ,i) = 0 for i £ i„-l, i n , i„ + 1, since r n jr n j n = O.By 
taking larger values of n, and selected values of i n , the nested intervals [ '-%r~ , lj! ^r- ] 
shrink to some point x. For any function f e C[0, 1], 

l’rt + 1 

ff = =f{ X! — > /(x), as n — > oo. 

i i=i n ~ 1 

The map x i-a- 5, is thus 1-1 and onto A. Furthermore x n —> x S Xn — ^ <5,, 
since the latter means / (x n ) —r fix) for all / e C[0, 1], in particular for the 
identity function /(x) := x. 

7. * The character space of the Banach algebra C[7) . . . . , T n ] generated by com- 
muting elements, is isomorphic to a compact subset of C" (use the map f \—x 

8. * The character space is weakly closed, i.e., f n e A AND f n if =>■ \[r e A. 
Consequently, for a separable Banach algebra, A is a compact metric space. 
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Israel Gelfand (1913-2009) studied functional analysis at the 
University of Moscow under Kolmogorov in 1935, specializing 
in commutative normed rings. During 1939-41 he studied Ba- 
nach algebras, introducing his transform and proving the spec- 
tral radius formula, which gave much impetus to the subject; in 
1943, with Naimark, he proved the embedding of special com- 
mutative ^-algebras into B(H)\ and then in 1948 he simplified 
the subject-matter with the introduction of the C*-condition 
||rc*:r || = ||:r|| 2 . 


Fig. 14.2 Gelfand 


Proof Taking the limits of i/r„ (S + T) = i /s„S + x[r n T, f n (XT) = , 

if n (ST) = (\j/ n S)(\j/„T), and 1 = 1, shows that i jr is an algebraic morphism. 
Also |^„r| ^ II 71 becomes \4rT\ ^ ||71 in the limit n — > oo, and i/r is contin- 
uous. For a separable Banach algebra, the unit ball in X * is compact with respect 
to the weak*-metric (Theorem 1 1.40), and so is its weakly closed subset A. 

The Gelfand Transform 

To see why characters may be useful, consider the algebra and its characters p z . 
A sequence such as x — (1/2, 1/4, 1/8, . . .) can be encoded as a complex power 
series in terms of its characters, p-( x) — z"/2 ,!+1 = (2 — z) . Then the 

convolution product x * • • • * x can be evaluated using characters instead of working 
it out directly. 


p z (x *■■•**) = p z (x) N = 


1 


(2 - zY 


=z 


77=0 


N(N+ 1) ■ ■ ■ (N + n — 1) 
n\ 2 N+n " 


For an example from probability theory, consider a random variable that outputs a 
natural number n = 0, 1, 2, ... , with probability l/2" +1 . The probability distribu- 
tion of the sum of N such random outputs is x * • • • * x, which can be read off from 
the coefficients of p z (x) N \ e.g. the probability of getting a total of, say 2, after N 
trials is N(N + 1)/2 A,+3 . Further, the mean of such a sum of random variables is 
given by differentiating (2 — z)~ N at z = 1, that is N. The key step is to consider 
p~(x) as a function of z. Its generalization leads to: 

Definition 14.36 




The Gelfand transform of T is the map T : A(X) - 
T(f) := fT. 

> a (T) defined by 


The element T is transformed into a function on the compact space A. The alge- 
braic structure is preserved, but the transform is generally neither 1-1 nor onto. 
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Proposition 14.37 

The Gelfand transform Q : 7 m- 7 is a Banach algebramorphism 

X-+ C(A), 

= XT = XT, 

T=i, 57 = 57 , ||f|| ^ ||r||. 

Its kernel ker Q contains the quasinilpotents and the commutators. 

For any analytic function on the spectrum of T , f e C"(cr(7)), 

/CO = / ° T. 


Proof It is clear from 

|7(0)-7(0)| = |07 — <pT\ < ||0-0||||7||, 
and |7(0)| = |07| ^ ||7||, forall 0, (f> e A, 

that T is a (continuous) Lipschitz and bounded function on A, with ||7|| c ^ || T 
For any 0 e A, we have: 


1 ( 0 ) = 01 = 1 , 

+7(0) = if (XT) = XfT = XT ty), 

5 + 7(0) = 0(5 + T) = ifS + 07- = (5 + f)(0), 

57(0) = 0(57) = 05 07 = 5(0) 7(0) = (57)(0). 

Clearly, from 7(0) = 07, 7 = 0 <+• A(7) = 0. If Q is a quasinilpotent then 
A (Q) C a(Q) = { 0 }. Also, [5, 7] = 57 — 75 = 0 since C(A) is commutative. 
Lastly, as 0(5 -1 ) = (05) _1 , for any 0 e A, 5 e X, 

JfT) ( 0 ) = 0/(7) = 0 ^ f /(z)(z - 7)- 1 dz) 

= <f Hz)(z - 07)- 1 dz (07 e cr(7)) 

Z7Ti J 

= /(0T-) = / o 7(0). D 


We cannot expect the Gelfand transform to be very useful for general algebras 
as it loses information by representing A” as a subspace of the special commutative 



14.5 The Gelfand Transform 


339 


algebra C( A); for example, S l TS = S l TS = T. But for commutative Banach 
algebras the situation is much improved: 

Theorem 14.38 

For a commutative Banach algebra, 

imf = A(T) = a(T), ||f|| c(A) = p{T), ker Q = J. 


Proof Any maximal ideal of a commutative Banach algebra is the kernel of some 
character'. Given a closed ideal M, the mapping < J>(7’) := T + Ad is a Banach 
algebra morphism A" X/M with Ml = ker O (Exercise 13.10(19)). By Exercise 
13.10(18), when M is also maximal in X, then X/M. has no non-trivial ideals, and 
so is isomorphic to C (Example 14.5(4)). Hence <t> : X — » X / M = C is a character. 

But any non-invertible T belongs to some maximal ideal M (Example 13.5(8)); 
so there must be some if e A such that M — ker if . implying ifT = 0. Thus T — /. 
is not invertible if, and only if, there is a if e A, with ifT — /, = \ji(T — /,) = 0, 
i.e., /, e A (7’), and therefore A (T) = a(T). (Note that this shows the existence of 
characters in a commutative Banach algebra.) Since the two sets are the same, they 
have the same greatest extent, 

Flic = max \ fT\ = p(T). 
l/fSA 

The quasinilpotents are in the radical'. If Q is a quasinilpotent, and I e X, then 

p(TQ)= lim II (T Q) n II X/ " = lim || T n Q" || l/n < p(T)p(Q) = 0, 

n — >oo 11 11 n — > oo 11 11 


so Q is in the radical. Moreover, ker Q — J since 

f =0 & A(T) = { 0 } o(T) = { 0 }. 

Proposition 14.39 

A Banach algebra which satisfies, for some c > 0 and all T , 

l|71 2 ^c||r 2 ||, 

can be embedded in the commutative semi-simple Banach algebra C(A), 
via the Gelfand map. 
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Proof By induction on n, 

'in 'i 'in — 1 'in i 'in 

\\t\\ 2 <(c||r 2 ||) 2 < ••• < c 2 _ 1 ||r 2 

from which can be concluded 

I 'i — n ^n 2 n 

||r|| < lim c 1-2 || T 2 || = c p{T). 

n—>oo 

This inequality has various strong implications: 

X is semi-simple: 0 is clearly the only quasinilpotent. 

X is commutative: For any S, T e X. 


|| ST || ^cp(ST) = c p(T S) < c||rS||. 

Hence, the analytic function F(z) := e~ zT Se zT is bounded, 

Vz e C, ||F(z)|| < c||Se' 7 'e _z7 '|| = c||S||. 

By Liouville’s theorem, F must be constant, e~ zT Se zT = S, that is, e zl S = Se zT . 
Comparing the second terms of their power series expansions, 

(1 + zT + o(z))S = S( 1 +zT + o(z)). 


gives TS — ST. 

The Gelfand map is an embedding: Q has the trivial kernel 3 , and is thus an 
algebra isomorphism onto X C C(A). Moreover, || T || ^ c p(T) = c || T || so Q~ 1 
is continuous. □ 

Exercises 14.40 

1. In C, as well as C N , l°° and C[0, 1], the only quasinilpotent is 0. 

2. Quasinilpotents are preserved by Banach algebra morphisms. 

3. A quasinilpotent upper triangular matrix must have Os on the main diagonal, so 
is nilpotent. Deduce, using the Jordan canonical form and Theorem 13.8, that 
every quasinilpotent of a finite-dimensional Banach algebra is nilpotent. 

4. (Q, R) e X x y is quasinilpotent (or radical) when both Q and R are. 

5. The operator V : l°° —> £°° defined by V (a n ) := (0, flo, «t/2, «2/3, . . .) is 
quasinilpotent. 

6. Prove directly that the Volterra operator / i-> f Q ' /, on C[0, 1], is a quasinilpo- 
tent. 
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-I c 

7. A quasinilpotent for which || (z — T) || ^ for all z in a neighborhood of 0, 

\z I" 

must in fact be a nilpotent. (Hint: use ||r" || ^ ^ f |z|"ll (z ~ T)~ l || dz ^ ec.) 

8. piTQS ) = 0 for any S,T e X, Q e J. (Hint: Example 14.2(5).) 

9. If \[r e A and / e C“(<j(T)), then = f(\[rT). 

10. S(T ) and A (T) have better properties than o(T), and may yield useful infor- 
mation about it: 

(a) S(S+T)0S(S) + S(T), 5(1) = { 1 }, S(kT) = kS(T), 

(b) A(S + T) c A(5) + A (T), A (ST) c A(S)A(7). 

11. ForC^, A = { 5i, . . . , Sn } where <$,(zi, . . . , zn) Zi are the dual basis. The 
same is true for the space co, A = { <5; ec q : <5; ( ciq , a\ , . . .) = a/ }. 

12. For B(C 2 ) (and B(C N )), A — 0. (Hint: Consider products of 
etc.) 

13. For characters of the group algebra C G , ir(e h -\ gh ) = ^(e g ) and \ir(e g )\ — 1. 

14. ► The invertible elements of a commutative X correspond to the invertible 
elements of X. 

15. The Gelfand transform on C N , mapping C iV — > C(A) = C N , is the identity 
map. The same is true for C[0, 1], so er(/) = im / for f e C[0, 1], 

16. ► The Gelfand transform gathers together various classical transforms under 
one theoretical umbrella: 

(a) Generating functions : Q : l 1 -* C(D), maps a sequence x = (a n ) to a 
power series on D , the unit closed disk in C, 

OO 

(Zhi ) ^ \ ci n Z ■ 

n = 0 

oo 

(b) Q : £ l (Z) — >■ C (S *) is similar, x(0) := a n e lne . It follows that a(x) = 

n=—oo 

{ jc(0) : 0 ^ 0 <2 jt }, and the sequence x is invertible in f'(Z) (in the 
convolution sense) exactly when a n e ln<l f 0 for all 9. This is essentially 
Wiener’s theorem: If f e C (S 1 ) is nowhere 0 and/ e l 1 (Z) then the Fourier 
coefficients of 1 // are also in i 1 (Z). 

(c) Fourier coefficients: L l (S l ) — > C(Z) = l°°( Z), 




fin) := 


e - ine f(6)d9. 
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(d) Fourier transform'. L*(K) — > C(R), 

m := J e~ ix ^f{x)dx. 

(e) Laplace transform: L*(]R + ) — > C(C + ), 

/>oo 

Cf(s):= / e~ sx f{x)dx, Res^O. 

Jo 


In all these cases, f * g = f 7j. 

17. * In any Banach algebra, if ST = TS then cr(,S + T) C <j(S) + a(T) and 
a {ST) c a(S)a(T). (Hint: Consider the commutant algebra { S, T }" (Exercise 
13.10(14) and Example 13.5(6)).) 

18. In a commutative Banach algebra, e s+T = e s e T , and De T — e T . 

The set of exponentials e x is a connected group, so e x — £ — Gi (Proposition 
13.24). 

19. A Banach algebra which satisfies || T 2 1 = ||7’|| 2 is isometrically isomorphic to 
a subalgebra of C(A): the condition is equivalent to ||7’|| = p(T) = Ill’ll. 

20. Conversely to the proposition, a Banach algebra that can be embedded in some 
C{K) (K compact) satisfies ||7’|| 2 ^ c||7’ 2 ||. 

Remarks 14.41 

1 . Given a compact set K C C, is there an element T with spectrum a(T) — K ? Of 
course, this is false in the Banach algebra C, where all spectra consist of single 
points, and in /f ( CC v ) , where the spectra are finite sets of points. But in i°° there 
are elements with any given compact set K for spectrum (Example 14.2(3)). 

2. The distinction between <r p , er c and ay is not purely of mathematical interest. 
In quantum mechanics, a solution of Schrodinger’s time-independent equation 
Hf = Ef gives energy-eigenvalues with eigenfunctions that are “localized” 
(since i jr e L 2 (R 3 )), whereas the continuous spectrum corresponds to “free” 
states. 

3. Among the operators in Section 14.2, one can find examples without point, con- 
tinuous or residual spectra (and any combination thereof, except all empty). Note 
also that the spectra of these examples are misleadingly not hard to compute in 
contrast to generic operators. 

4. There are various definitions of spectra of T that are subsets of a (T ) . The singular 
spectrum is the set of X such that 7 ’— a is a topological divisor of zero. The essential 
spectrum consists of X such that 7" — A is not Fredholm. 
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5. Recalling p x (T) := limsup,, || T n x \\ «, defined for T e B(X) and* e X (Remark 
13.32(4)), suppose a closed subset of the spectrum of T is isolated from the rest 
of the spectrum by a disk, <7i C B r (a). If p x ( T — a) < r then x e X\ since 

P\x = (f (z — T)~ l xdz = ^ a n (T — a) n x = x. 

2tu J ai “ 

6. The Gelfand transform can be extended to 5(rT) — > S(T) retaining the same 
(non-multiplicative) properties. 


Chapter 15 

C * -Algebras 


B(H) is a special Banach algebra when H is a Hilbert space because there is an 
adjoint operation that pairs up operators together. Its properties can be generalized 
to Banach algebras as follows. 

Definition 15.1 


A (unital) C* -algebra is a unital Banach algebra with an involution map * : 
X — > X having the properties: 



(XT)* = A T*, 

iin 2 . 


A *-morphism is defined as a Banach algebra morphism <t> which also preserves 
the involution Q>(T*) = (<t>7’)*. 


Easy Consequences 

1. 0* = 0, 1* = l,z* =z (by expanding (0+1)*, (1*1)*, and ( z l)*). 

2. || 71 = ||7’*|| (since ||71 2 = ||7’*7’|| < ||7’*||||r||, and so ||r|| < ||r*|| < 
||r**||); the involution map is thus continuous and bijective. But it is neither 
linear ((H)* = —iT*), nor differentiable (since (T + H)* = T* + H*). 

3. ||7T*|| = Il + H 2 . 

4. (T*) -1 = (r^ 1 )* when T is invertible. 


J. Muscat, Functional Analysis, DOI: 10. 1007/978-3-3 19-06728-5_15, 
© Springer International Publishing Switzerland 2014 


345 



346 


15 C * -Algebras 


5. p(T*) = p(T), a(T*) = cr(T)* (since (T* - A) -1 = (T - A) -1 *). 1 

One might expect that ||7'*|| = j| 7’| be taken as an axiom, and indeed Banach 
algebras with involutions satisfying this weaker axiom are studied and called Banach 
*- algebras . C*-algebras resemble C more closely, except for commutativity: the 
chosen axiom, which is the analogue of the familiar one zz = |z| , is much stronger 
and can only be satisfied by a unique norm, if at all (Example 15.10(6)). 

Examples 15.2 

1 . The simplest example is C with conjugacy. C ,v has an involution 


(zi, • ■ • . zjv)*: = (zi, . . . , ztv). 


This example extends to l°°. 

2. C[0, 1] with conjugacy, / (z) := /(z). 

3. B(H) with the adjoint operator, where H is a Hilbert space (Proposition 10.20). 
We will see later (Gelfand-Naimark’s Theorem 15.48) that every C*-algebra can 
be embedded into B(H) for some Hilbert space H. 

4. B(H) contains the closed *-subalgebra 


C @ K, {a + T : a e C, T e B(H) compact } 


5. If *andCPareC*-algebrasthensois Xxy with(S, T)* := (S*, T*) (Examples 
13.3(7)). 

6. O l x (Z,) has an involution (, a n )* := (a_„), that satisfies ||jc*|| = ||jc|| but not 
||jc* * jf || = ||x|| 2 . However, it can be given a new norm, |||jt||| := ||Ti|| where 
L x y := x * y for y e l 2 , and L : x L x embeds l l (Z) as a commutative 
C*-subalgebra of B(£ 2 ). Similarly for L*(R). 

7. O The group algebraC G has an involution making it a ^-algebra, but not a C*- 
algebra, 





geG 


(However, it is a C*-algebra when represented by matrices and their norms.) 

Exercises 15.3 

1. Polarization identity : If to is a primitive root of unity, uj" = 1, then 


1 To avoid ambiguity with the closure A of a set A C C, A* will denote the set of conjugate numbers 
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1 

T*S = -Y J(S + u> i T)*(S + JT), 

n ' 

i=l 

1 " 

S*S + T*T = - V (5 + w'r) H, (5 + lu'T). 

n ' 
i=l 

2. For any real polynomial (or power series) in T , p(T)* = p(T*). 

3. If T is a nilpotent, a quasinilpotent, a divisor of zero, or a topological divisor of 
zero, then so is T*, respectively. If T*T is a nilpotent, then so is TT*; but find 
an example in B(l 2 ) where T*T is invertible yet TT* isn’t. 

4. If T*T and TT* are both invertible then so is T, 

— l (T*T ^) — ^ j 7 * — l 

5. If the condition number of T is c, that of T*T is c 2 (Exercise 8.14(5)). 

6. The inner-automorphism T m- S~ 1 '/’.S’ is a ^-automorphism exactly when .S’ .S’* 
belongs to the center X' (in which case S*S = SS*). 

7. * A ^-isomorphism B(H\) — > B( Hi) is of the type T LTL~ [ where 
L = XU, A ^ 0 real, and U : ll\ — > IT is a Hilbert-space isomorphism. 

8. A *-ideal is an ideal that is closed under involution. Examples include the kernel 
of any *-morphism and the Jacobson radical. 

9. If A c X is closed under adjoints (A* = A), then so is its commutant A' (which 
is thus a C*-subalgebra) (Exercise 13.10(14)). 

10. * Suppose X has no unity but otherwise satisfies all the axioms of a C*-algebra. 
Show that the embedding L : X — »■ B(X) (Theorem 13.8) is still isometric, 
and that LX © [[/ [] with the adjoint operation ( L a + A)* := L a * + A is a unital 
C* -algebra. 


15.1 Normal Elements 


It is a well-known fact in Linear Algebra that real symmetric matrices are 
diagonalizable with real eigenvalues and orthogonal eigenvectors. This makes 
them particularly useful and simple to work with, e.g. if T = PDP~ l then 
f(T) = Pf(D)P~ l can easily be calculated when D is diagonal. However, these 
matrices do not exhaust the set of diagonalizable matrices via orthogonal eigenvec- 


tors: for example, diagonalizable matrices, such as 



, may have complex 


eigenvalues. As we shall see later, diagonalization is closely related to the commu- 
tativity of T with T*. 
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Definition 15.4 

An element T is called normal when T*T = TT* , unitary when T* — T~\ 
and self-adjoint when T* = T. 

Examples 15.5 

1. It is clear that self-adjoint and unitary elements are normal. 

2. Any z e C is normal; it is self-adjoint only when z e K; it is unitary only when 

kl = i. 

3. A diagonal matrix is normal; it is self-adjoint when it is real, and unitary when 
each diagonal element is of unit length | an \ = 1. 

More generally, diagonalizable matrices, of the type T = U DU* where U is 
unitary and D is diagonal, are normal: T*T = UD*U*UDU* = UD*DU* = 
U DD*U* = TT*. 

4. The operator Tf(x) := /J k(x, y)f(y) d v on L 2 [0, 1] is normal when (Example 
8.6(4c)) 

l t 

J k(s, x)k(s, y) dy — j k(x, s)k(y, s) ds a.e.(x, y) 

0 0 

5. When T is normal, a polynomial in T and T* looks like 


„ ^ At riM 

p(T,T*) = V V a nm TT 

*-^n= 1 


The set of such polynomials C['/\ T*] is a commutative *-subalgebra. The char- 
acter space of its closure C[7\ T*] is denoted by A 7 -. 

6 . A unitary matrix is a square matrix whose column vectors are orthonormal. A 

self-adjoint matrix is a square matrix [a ( / ] such that ciji = d, j , e.g. 

Proof If Uj denotes the ;th column of U, then U*U = I implies 



(Ui, Uj) = u*Uj = Sij. 


7. The unitary operators of B(H) are the Hilbert-space automorphisms of H 
(Proposition 10.23). 

8 . ► If T is normal, then so are T*, T + z, zT, T n , and T~ l when it exists. But the 


addition and product of normal elements need not be normal 


’ eg ‘ ( 02 ) 


and 
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Proof for T 1 . Taking the inverse of TT* = T*T together with (T 1 )* = 
(T*r l gives the normalcy of T -1 . 

9. ► If T n are normal and T n — > T, then T is also normal, i.e., the set of normal 
elements is closed (as are the sets of self-adjoint and unitary elements). 

Proof The limit as n — >■ oc of T* T n = T n T* is T*T — TT* since the adjoint is 
continuous. Similarly take the limit of 7j* = T n or T* = T n 1 to prove the other 
statements. 

10. ► If S, T are self-adjoint, then so are .S' + T, XT (A e R), p ( T ) for any real 
polynomial p, and T~ l if it exists. But ST is self-adjoint iff ST = TS. 

11. ► If T is self-adjoint, then e' 1 is unitary; in fact, letting U’ := e uT , t e R, gives 
a one-parameter group of unitary elements (Exercises 13.25(9) for definition). 

The analogy of self-adjoint elements with real numbers and unitary elements with 
unit complex numbers raises the issue of which propositions about complex numbers 
generalize to C* -algebras. 

Proposition 15.6 


Every element T can be written uniquely as A + iB with A and B self- 
adjoint, called the real and imaginary parts of T , respectively. 


The real and imaginary parts of T are denoted Re T and Im T. 

Proof Simply check that A := (T + T*)/2 and B := (T — T*)/2i are self-adjoint. 
The sum A + i B is obviously T . Uniqueness follows from the fact that if A + i B = 0 
for A, B self-adjoint then A = 0 = B since 

A — A* = (-iB)* = iB = -A. □ 


Proposition 15.7 

The set of unitary elements U(X) is a closed subgroup of Q(X), 
U, V unitary =>■ U V. U~ x unitary. 

Unitary elements have unit norm, || U\\ = 1 . 


I'roof If U„ are unitary and U n — > 7\ then by continuity of the involution, U* — > T*. 
Also, the equations U*U n = 1 = U n U* become T*T = 1 = TT* in the limit, that 
is, T~ l = T*. 
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For any U,Ve U(X), UV and U*(= U 1 ) are also unitary, 

(uvf = v*u* = v~ l u~ l = (uvr 1 
u** = u = ( u~ l r l = ( u*y l . 

Finally, \\U\\ 2 = \\U*U\\ = ||1|| = 1. □ 

The next theorem starts to unravel the close connection between normal elements 
and their spectra. 

Proposition 15.8 

For T normal, p(T) = || 7'||, and S(T) is the closed convex hull of cr(T). 
Proof (i) For any normal element T, ||T 2 || = ||T|| 2 since 


y || 4 ||y.*^||2 || ( T* T )* ( T* T) || || t T 2 )* T 2 || || y2 1| 2 


But T 2 itself is normal, so the doubling game can be repeated to get, by induction. 



T 2k \\ = \\T 


p(T ) = lim || T n || 1/n = lim ||r 2 *|| 2 k = \\T 



(ii) As S(T ) is a closed convex set that contains cr(T) (Proposition 14.34), it must 
also contain the convex hull of the latter. Notice that, by (i), a(T) reaches to the 
boundary of S(T). 

Conversely, suppose A is not in the closed con- 


vex hull of cr(T). There must be a straight line 
through A not intersecting <r(T) (why? Hint: con- 
sider rays emanating from A; they intersect the 
closed convex hull over an interval of angles). So 
the spectrum can be enclosed by a ball B r [z\ that 
does not meet the line (Exercise 6.22(7)). 



For any </> e S, 


\4>T - z\ = 1 4>{T - z)\ < II r - z|| = p(T - z) < r < |A - z| 


so A ^ oT . It follows that S(T) has the same points as the closed convex hull of 
a(T). □ 
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Proposition 15.9 Fuglede’s theorem 

If T is normal and ST = T S then ST* = T*S. 


Proof From f{T)S = Sf{T) (Example 14.23(lb)), we have e~ zl Se zT = S. Writ- 
ing zT = A + iB and noting that zT is normal, so A B = BA (Example 15. 10(1 a)), 
we find 


F(z) := e~ zT *Se zT * = e~ A+iB Se A ~ iB 

= e 2iB e~ zT Se zT e~ 2iB 
= e 2iB Se~ 2iB 

.'. ||F(z)|| < \\S\\ by Example 15.5(11). 

As F is a bounded analytic function of z, by Liouville’s theorem it is constant, 
F(z) = /’ (0) = .S', i.e., e" T S = Se zT . Comparing the second term of their power 
series gives T*S = ST*. □ 

Examples 15.10 

1. If T = A + iB, where A, B are self-adjoint, then T* = A — i B and 

T*T = (A 2 + B 2 ) + i[A, B], 

TT* = (A 2 + B 2 ) - i[A , B], 

(note that i[A, B] = \[T*, T] is self-adjoint). So, 

(a) T is normal if, and only if, AB — BA; 

(b) T is unitary if, and only if, AB — BA and A 2 + B 2 = 1; 

(c) T is self-adjoint if, and only if, B = 0. 

2. X is commutative if, and only if, every element is normal. 

Proof If every element is normal, then for any T = A + iB, AB — BA, i.e., any 
two self-adjoint elements commute. But then TS = (A + iB)(C + iD) = ST. 
The converse is obvious. 

3. (a) For T normal, ||7'"|| = ||7T, since || 7’|| = p(T) < ||7'"|| 1/ " < ||T||. 

(b) For any T, ||r|| 2 " = ||(T*7’)"|| and ||7’|| = Jp{T*T). 

4. ► 0 is the only normal quasinilpotent and the only radical element, that is, every 
C*-algebra is semi-simple. More generally, if T is normal with <r(T) = { z }, then 

T=z. 

Proof If Q is a normal quasinilpotent, then || Q | = p( Q) = 0, so Q = 0. If P is 
a radical element, then ||P|| 2 = ||P*P|| = p(P*P) = 0. 
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5. Every C*-algebra has a unique norm satisfying ||P*P|| = ||P|| 2 . 

Proof Suppose there is a second C*-norm. Then the norms must agree on normal 
elements, ||P|| = p(T) = |||P|||, and so must agree on all elements 

imi = II T*Tp = 1 r* r|| 2 = I r in . 


Exercises 15.11 

1. What are the normal, self-adjoint and unitary elements of l°° and C[0, 1]? 

2. Generalizing from diagonal matrices, any multiplier operator on l 2 , (a n ) m- 
( b n a „ ) is normal. It is self-adjoint when b n e R, and unitary when \b n \ = 1, 
for all n . 

Find similar conditions for a multiplier operator on L 2 (R), Tf := gf, 
(g e C(R)). 

3. Triangular matrices, such as (it) , are not normal (unless diagonal). A real 
diagonalizable matrix, such as (-•.;) , need not be self-adjoint. 

4. For any T, aT + (3T * is normal when |a| = \/3\. 

5. A ^-morphism preserves normal, self-adjoint, and unitary elements. 

6. If Pi are normal idempotents with PiPj = Sij Pi as well as Pi H \-P n = 1 , then 

ZiPi H — ■ + z n Pn is normal (unitary when |z,j = 1) and for any polynomial p, 

PiZlPl H VZnPn) = P(Z])P\ H h P(Z„)P„. 

Unitary elements 

7. The shift-operators on l 2 ( Z) are unitary, with a(R) = a(L) = S l (but on l 1 , 
they are not even normal). 

i 

8. Translations T a f(x) := f{x — a) and stretches S a f(x) := a?- f(ax) (a > 0), 
acting on L 2 (R), are unitary. 

9. If U is unitary then for any T, ||t/P|| = ||P|| = ||ri/||. 

10. If U e X is unitary and V := AU (A ^ 0), then T i->- V~ l TV is an inner 
♦-automorphism of X . 

11. If T is an invertible normal element, then T*T~ l is unitary. 

For example, the Cayley transformation U := ( i — T*){i + T)~ l maps T to a 
unitary element if i + T is invertible. Compare with the Mobius transformation 
Z i — ^ (i — z)/(i + z), which takes M to the unit circle (0 i— >• 1, 1 i — ^ r, c>o i — ^ — 1). 

12. U(X) need not be a normal subgroup of G(X); when does T~ [ UT C U hold? 
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Self-Adjoint elements 

13. The operator T fix) := f k(x, y)f(y) dy on L 2 (R) (k e L 2 (R 2 )) is self-adjoint 
when k(y, x) = k(x , y) a.e. (Hint: Examples 10.24(3), 8.6(4)). 

14. For any T e X, the elements T + T * , T*T and TT* are self-adjoint. 

15. The real and imaginary parts of T satisfy || Re T|| ^ ||!T||, || Im 7j| ^ ||7j|. 

16. Find the real and imaginary parts of ST when S and T are self-adjoint. 

Spectra of Normal Elements 

17. For S, T normal, p(S + T) < p(S ) + p(T), and p(ST ) < p(S)p(T). 

18. When T is normal, then ||r||e' e is a spectral value for some 6. 

19. Let Q 0 be a quasinilpotent, then 1 + Q is not normal. More generally, if T 
is normal and T Q = QT, then T + Q is not normal. 

20. If A*B = 0 = AB*, then ||A + B|| = max(||A||, ||B||). (Hint: Show 
|| A + fi|| 2 " = || (A* A)" + (fi*fi)"||.) 

21. If S and T are commuting normal elements, then ST is also normal. 

22 if -p* j j s an idempotent then so is TT* . 

23. A commutative C*-algebra is isometrically embedded in some C(K) (Exercise 
14.40(19)). 

24. Let O : X —> y be a ^-morphism between C*-algebras with X commutative. 
Then T>(T) is normal in y for any T e X, and <t> is continuous with ||<J>|| ^ 1 
(Hint: cr(c h(T)) C a(T)). 


15.2 Normal Operators in B(H) 

Let us see what properties normal elements have for the most important C*-algebra, 
B(H) when H is a Hilbert space. 

Proposition 15.12 

For a normal operator T e B(H), 

(i) lirLtll = 117*11, 

(ii) ker T 2 — ker T = ker T* = (im T) 2 - , 

(iii) im T is dense in H T is 1-1, 

(iv) T is invertible in B(H) O 

3c > 0, V.r e H, c||x|| ^ ||Tx||- 
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Proof (i) follows from 

||7’*x|| 2 = {T*x, T*x) = { x , TT*x) = (x, T*Tx ) = ( Tx , Tx) = \\Tx\\ 2 . 

(ii) ker T — ker T* is due to F*x = 0 <£>• || T^x || = || 7\x || = 0 O Tx = 0, using 

(i). ker T 2 = ker T, i.e., T 2 x — 0 Tx — 0 follows from 

|| T’jc || 2 = (x, T*Tx) < ||x||||7’*7’x|| = ||x|| ||7’ 2 x||. 

From Proposition 10.21, (im T) 1 - = ker T* = ker T . 

(iii) By (ii), T is 1-1 if, and only if im T = (ker T) 1 - — O 2 - — H. 

(iv) If F has a continuous inverse, then ||x|| = ||7’ _1 7’x|| ^ || T~ l || || T'x || . Conversely, 
if the given inequality is true for all x e 77, then '/is 1-1 and the image of T is closed 
(Examples 8.13(3)). By (iii), im T — 77 and T is bijective. Its inverse is continuous: 

c||r _1 x|K || 7’7’ _1 x|| = ||x||, Vx e H. □ 


Proposition 15.13 

For a normal operator T e B(H), 

(i) Tv = Xv T*v — Xv, and eigenvectors of distinct eigenvalues of 
T are orthogonal, 

(ii) a(T) contains no residual spectrum, rr, / / ) = 0, 

(iii) isolated points of a(T) are eigenvalues. 

Proof (i) is a direct application of ker(F — A) = ker(7’* — A), as T — A is normal. 
Note that the eigenvectors of T and T* are identical. For eigenvalues A and // with 
corresponding eigenvectors x and y, we have 

A {y, x) = ( y , Tx) = ( T*y , x> = {fly, x) = n{y, x), 

implying either A = p, or (y, x) = 0. 

(ii) Let A e cr( T )\ either 7’ — A is not 1-1, in which case A is an eigenvalue (point spec- 
trum); or it is 1-1, in which case its image is dense in 77 by the previous proposition, 
and A forms part of the continuous spectrum. 

(iii) If { A } is an isolated point of o(T ), form the projection 

P :=^ij(.z-TT X & z 

(A) 
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onto a space X\ ^ 0 (Example 14.28(1)). Then cr(T |x A ) = { A }, and since T\x x is 
normal as well, ||T|x A — A|| = p(T \x x — A) = 0, i.e., Tx — Ax for any x e X\. □ 

Examples 15.14 

1. ► A projection P e IU H ) is normal <s> self-adjoint orthogonal | P | = 0 
or 1. 

Proof If P is orthogonal (Theorem 10.12), then (x — Px) _L Px, so 

(x, Px) = ((/ — P)x + Px, Px) = ||Px|| 2 e R. 

hence (x, Px) = (Px,x) — (x, P*x) for all x e H, and P = P* (Example 
10.7(3)). 

If || Z 3 1| = 1, let x e (kerP) 2 -, so that x lx - Px. Then ||Px|| 2 = ||x|| 2 + 
|| Px — x|| 2 , yet ||Px|| ^ ||x||, sox = Px e im P and ker P _L im P. The other 
implications should be obvious. 

2. All spectral values of a normal operator are approximate eigenvalues (either eigen- 
values or part of the continuous spectrum) and there are no proper generalized 
eigenvectors (Section 14.3). Note that a normal operator need not have any eigen- 
values, e.g. Tf(x) := xf(x) on P 2 [0, 1], 

Exercises 15.15 

1. ► Conversely to the proposition, an operator which satisfies || P*x|| = || Px|| for 
all x is normal. 

2. When T is a normal operator, ker T and im T are both T - and P*-invariant. 

3. Suppose T n x —> Tx for all x e PI where T n are normal operators in B(H). 
(T is an operator by Corollary 11.35.) Then T is normal if, and only if, Vx, 
Tfx -* T*x. 

4. The eigenvalues of self-adjoint operators are real, and those of unitary operators 
satisfy |A| = 1. 

5. A normal operator on a separable Hilbert space can have at most a countable 
number of distinct eigenvalues. 

6. Suppose H has an orthonormal basis of eigenvectors of an operator T e B(H). 
Show that T is normal. (Hint: show ||7’*x|| = ||7\x||.) 

7. If Tx = Ax, T*y = py, and /i/A then (y, x) — 0 (T not necessarily normal). 

8. An Ergodic Theorem : Consider the Cesaro sum 

T n := (/ + T + ■ ■ ■ + T n ~ % )/n. 

If p{T) < 1 then T n ss (/ — T)~ l /n — > 0 as n — > oo. Now let T be a normal 
operator with p(T) = 1. 
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(a) For Tx = x (i.e., x e ker(T — /)), we get T n x = x ; 

(b) For x = y — Ty e im(7 — /) we get T n x = (j — T n y)/n — > 0; 

(c) For any x e H, T n x — > xo G ker(T — /), the closest fixed point of 7\ 

If T is not normal then T n may diverge, e.g. T = GO gives T n = 
/ 1 (n — l)a/2\ 

[o 1 ) 

The Numerical Range 

To help us further with analyzing the spectra of normal operators, we require an 
additional tool. A given vector x need not, of course, be an eigenvector of an operator 
T, but we can ask for that value of A which minimizes j| T x — Ax||. According to 
Theorem 10.12 there is indeed a unique vector Ax e flx]] which is closest to Tx, 
and it satisfies (Tx — Ax) _L x, or equivalently, A = (x, 7’x)/||x|| 2 . This number is 
sometimes called the mean value of T at x, or the Rayleigh coefficient , and denoted 
by { T) x . We are thus led to the following definition: 

Definition 15.16 


The numerical range of an operator T e B(H ) Is the set 
W(T) := { (x, Tx) : ||x|| = 1 }. 


Examples 15.17 

1. (I) x = 1, (T + S) x = (T) x + {S) x , (AT), = A (T) x , (T*) x = Jf) x . 

These are easily verified, e.g. 

(x, T*x) = (Tx, x) = (x, T x) 

2. ► For operators on a complex FTilbert space, 

(a) W(I) = { 1 }, and ff(z) = {z}(z€ C), 

(b) W(T + z) = W(T) + z (translations), and W (XT) = A W(T), 

(c) W(S + T) c W(5) + W(T), 

(d) W(T*) = W(T)*. 

3. W(T ) includes the eigenvalues of T and is bounded by ||7||. 

Proof If Tx = Ax for x a unit vector, then (T) x = (x, Tx) — A. Also, for unit 
x, |(x, Tx) | ^ ||Tx|| ^ || T || by the Cauchy-Schwarz inequality. 
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4. Although the quadratic form x m- (x, T x) is unique to E, i.e., (x, Tx) — (x, Sx ) 
for all x if, and only if, T — S (Example 10.7(3)), the numerical range W (T) 
does not identify E in general, e.g. W(U~ l TU) = W (T) when U is unitary. 

5. For a fixed unit x e //, one can define two semi-inner-products on B(H), 

(a) ( S,T ) := ( Sx,Tx ) = (S*T) X (with associated semi-norm |||r|||^. := 
II Ex ||), and 

(b) the covariance semi-inner-product 

Cov(S, T) := (S-(S) X , T - (T) x ) = {S*T) X — JS) X (T) X , 
with the associated semi-norm called the standard deviation 
of := Cov(E, T)= || Ex || 2 - |(E>,| 2 . 

(c) The uncertainty principle states that o$o j ^ |Cov(S, E)| (essentially the 
Cauchy-Schwarz inequality (Exercise 10.10(17))). The normalized inner 
product Covt.S - , T)/osoj is called the correlation ; E and S are called 
independent when they are orthogonal, Co v(S, E) = 0, so that (S, T) = 

JS)x(T) x - 

These definitions are usually applied to L 2 (A), where x corresponds to a function 
p e L 2 (A), with |/?(.v)| 2 interpreted as a probability distribution, and the operators 
are multiplications by functions E p fp, that is, 

the mean {f) p = f A f(s)\p(s)\ 2 ds, the rms |||/||| p = Jj A \f\ 2 \p\ 2 
Cov(/, g) = f A (f — (f))(g — {g))\p\ 2 - 

We can now elucidate the connection between the numerical range and the spec- 
trum of an operator, hinted at in the examples above. 

Proposition 15.18 (Hausdorff-Toeplitz) 


W (E) is a convex compact subset of C, such that 


<t(E) c W(T) c S(T). 


Proof Recall the state space S(X) from Definition 14.33, where we now take the 
case A” = Bi ll). The inclusion W (T) C 5(E) is obvious: for any unit vector x, the 
functional 4>(T) (x, Ex) is linear in E, maps I to 1, and \(f>(T)\ = |(x, Ex)| ^ 

|| E || , so ll^ll = 1 and </> e S. As 5(E) is compact (Proposition 14.34), so must be 
its closed subset W(T). 
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The main part of the proof is to show the other inclusion <t(T) C W(T ): for 
M = 1, A e C, 

a := d{ A, W(T)) < \{x, Tx) - A| = \{x, (T - A)x)| < ||(T - A)x||, 
so for any x e H, 

a||x|| ^ ||(r - A)x||. 


When A ^ W{T), a is strictly positive, and the inequality shows that T — A is I - 
1 with a closed image (Example 8.13(3)). Moreover, since W(T*) — W(T)* and 
d( A, W(T)*) = d (A, W(T)), 

a||x|| < || (T* - A)x||. 

This implies that ( T — A)* is 1-1, hence T — A is onto (Proposition 10.21). Thus 
T — A has an inverse, which is continuous (Proposition 8.12), 

a||(r - A) _1 x|| ^ H(r - A ){T - A) _1 x|| = ||x||, 


and A ^ cr(T). 

W(T) is convex'. Given A, /i in W(T) (A ^ fj.), let x, y be unit vectors such that 
(x, Tx) = A, (y, Ty) = fj,. Any vector v := ae'^x + fie'^y ( a , /?, e M) 

has norm 


||u|| 2 = a 2 + 2 a/3 Ree' ( ^ 2 ^(x, y) + 0 1 = 1 + sin20 Re(e‘^{x, y)), 


for a = cos 9, (3 = sin 9, <fi := (f >2 — 4>\ . Then (u, Tv) works out to 


{ae'^x + de'foy, ae'^'Tx + fje'^Ty) = a 2 \ + a/3(e'^(x, Ty) + e l ^( y , Tx)) + (5 1 v 

= A cos 2 8 + sin 2 8(w cos <f> + z sin 0) + ^ sin 2 8 
A “h A — /r 


+ 


cos 28 + (w cos 0 + z sin 0) sin 28 


where w \{{x, Ty) + (y, Tx)), z := j((x, Ty) — (y, 7’x)).Butu) ( ^, := wcos</> + 
Z sin o traces out an ellipse as (!) varies. By choosing the correct value of </>, uu can 
be made to point in any direction in the complex plane, including that of A — With 
this choice, ( v , ru)/||u|| 2 gives a line segment as 6 varies, a line that contains A and 
/r (at 6 = 0, 7t/2). Thus W (T), and its closure W ( T ), are convex sets. □ 

As an immediate corollary, this proposition allows us to identify the self-adjoint 
operators among the normal ones from their spectrum: 
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Proposition 15.19 


For a normal operator T , 

a) wm = s(T), 

(ii) T is self-adjoint <$■ W(T) is real. 


Proof (i) By the previous theorem, W(T) C S(T), so the reverse inclusion remains 
to be shown. The closure of the numerical range W(T ) is a convex set contain- 
ing cr(T), so it contains its closed convex hull, which is S(T) when T is normal 
(Proposition 15.8). 

(ii) When T e B{H) is self-adjoint, ( x , Tx) — ( Tx,x ) = (x, Tx) for all x e H, 
which implies W(T) C R. Conversely, if (x, Tx) e R. for all vectors x, then 

( Tx,x ) = (x, T x) = ( T*x,x ) 


which can only hold when T* = T (Example 10.7(3)). Note that this implies that T 
is self-adjoint cr(T) C M, since W (T) would be a line interval. □ 

Exercises 15.20 

1 . W(T) = {z) O T = z . 

2. Show that, for the shift operators on l 2 , W(L) = If [0] = W(R). 

3. Let T be a square matrix dO with respect to an orthonormal basis, where 
A, D are square sub-matrices. 

(a) W(A) U W(D) c W(T). 

(b) If B = C = 0, then W(T) is the closed convex hull of W (A) U W(D). 

4. Write a program that plots W (T ) for 2x2 matrices, and test it on random 
matrices. Verify, and then prove, that W(T) for 


(a) 

(b) 

(c) 


=-($* 

( a b\ . 

'■= Vc^j 1 


T := 

T := 

* T — 


is the line joining a to b\ 

is the closed disk Bi [a] (although its spectrum is { a }); 

2 

is generically an ellipse with its interior. 
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5. Let 7 be a square matrix with positive coefficients. If jc = {a\, , a,v) e C iV 

and := (|flt|, . . . , |ajv|), then 


l<*, Tx) | < (x+, 7x+) 

so that the largest extent of W ( 7 ) (and W (7*)) is a positive real number. 

6. The classical proofs of some of the statements above do not use the convexity 
properties of the numerical range. For a self-adjoint operator 7, 

(a) er(7) is real. Prove this by letting A := a + i/3 with (3 ^ 0, and showing 

11(7 - A)x|| 2 = ||(T - a)xf + f3 2 \\x\\ 2 > |/3| 2 ||x|| 2 . 

(b) W(T) is the smallest interval containing a(T). Show this by taking <r(7) C 
[a, b], letting c := (a + b) /2, and proving that for any unit vector x. 


|(x, Tx) — c\ = |(x, ( T — c)*)| ^ b — c = c — a. 


7. For any T e B(H\, H 2 ), W(T*T ) = [a, b], where a ^ 0 and b = \\T\\ 2 

8. If A i W(f), then || (A — T)~ l || < l/d(X, W{T)). 


9. A coercive operator T e B(H) satisfies |(x, 7 jc)| ^ c > 0 for all unit x e H. 
Show that it has a continuous inverse. An elliptic operator is one which satisfies 
(x , T x) ^ c > 0, a special case of a coercive self-adjoint operator. 

10. Let : B{H) — > C be defined by T 1— >■ (x, Ty) for some fixed unit x, y e H; 
show that cj) e S x = y. 


11. (a) Cov(7, T) = 0, Cov(S, T + A) = Cov(S, T), o T +\ = or; 

(b) For every A, ar ^ 11(7 — A)x||, so <jt ^ 5 diam <r(7) for 7 normal; 

(c) <jt — 0 x is an eigenvector of 7, with eigenvalue (T) x . 

(d) If S, 7 are self-adjoint operators, let A := j[.S - , 7] and h (A) x /2 = 
Co v(S, 7), then 

o'svt ^ h. 


15.3 The Spectral Theorem for Compact Normal Operators 

As seen before, multiplier operators such as diagonal matrices are normal. In fact, 
all normal operators are of this type; we show this first in the simple case of compact 
normal operators. 
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Theorem 15.21 Spectral Theorem for Compact Normal Operators 


If T is a compact normal operator on a Hilbert space, then 

OO 

Tx — y' i \„(e n ,x)e n , 

n= 0 

where e n are the eigenvectors of T with corresponding non-zero eigenval- 
ues A„ . 


The statement is written supposing an infinite number of eigenvectors; otherwise 
the sum is finite. 

Proof Let T be a compact normal operator. We show that 77 has an orthonormal 
basis of eigenvectors. 

(a) The fact that T is compact implies that the non-zero part of its spectrum 
consists of a countable set of eigenvalues, and each generalized eigenspace 
X\ := ker(T — X) kx is finite-dimensional (Theorem 14.18). 

(b) The fact that T — A is normal implies, first, that X \ = ker (T — A) consists 
of eigenvectors, and second, that X\ are orthogonal to each other (Proposition 
15.12,13). 

Note that the eigenvalues decrease to 0 (unless there are a finite number of them). 
This is part of Theorem 14.18, but its proof in the present context is much simpler: 
As T is compact, for any infinite set of orthonormal eigenvectors e „ , T e n (= A„ e n ) 
has a Cauchy subsequence, so 

I A n | T - | A Di | = || A n Cn X m e m II = || T c n T em || ^ 0, as n , m r oo 

implying both A„ — * 0 and that each eigenspace ker (7’ — A) is finite-dimensional. 

Thus a countable number of orthonormal eigenvectors e n (a finite number from 
each X\) account for all the non-zero eigenvalues, and form an orthonormal basis 
for the closed space M := || ei , e 2 , . . . I generated by them. M is T -invariant since 
x e M 1 - implies that for all n, ( e n , x) = 0, and as T*e„ = A„e„, 

(e n , Tx) = (T*e n , x) = A„(e„, x) = 0. 

Thus T can be restricted to M^, when it remains compact (Exercise 6.9(5)) and 
normal, yet without non-zero eigenvalues, because those are all accounted for by 
the eigenvectors in M. Its spectrum must therefore be 0, implying T\ M ± = 0, i.e., 
M 1 - = ker 7’. Unless M ■ = 0, there is an orthonormal basis of eigenvectors e a for 
it, and collectively with e„, form a basis for 77 = M ® M ' , 
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x = ^ (e n ,x)e n + ^ (e a , x)e a . 

n a 

Finally, since T is linear and continuous, and Te a = 0, we find that 
Tx = r^^(e„,x)e„^ = '^(e n ,x)Te n = y, ( e n ,x)X n e n ■ 

n n n 

Corollary 15.22 Spectral Theorem in Finite Dimensions 
A normal complex matrix is diagonalizable. 


There is a remarkable generalization of this diagonalization to any compact oper- 
ator between Hilbert spaces, including rectangular matrices: 

Theorem 15.23 Singular Value Decomposition (SVD) 

If T : X -> Y is a compact operator between Hilbert spaces, then there 
are isometry operators U : Y -> Y and V : X -> X such that T = UDV* 
with D diagonal. 


Proof T*T and TT* are compact self-adjoint operators, on X and Y respectively. 
They share the same non-zero eigenvalues (Examples 14.10(5)), which are strictly 
positive, since if T*Tv = Xv, ||u|| = 1, then 

A = {v, T*Tv) = \\Tv\\ 2 > 0. 

By the spectral theorem there is an orthonormal set of eigenvectors v n e X of T*T 
with eigenvalues A„ = a 2 > 0. It turns out that the vectors Tv n e Y are also 
orthogonal, 

(T v m , T v n ) = (v m , T*T v„) = cr 2 5 nm , 
so u n := T v n /a n form an orthonormal set in Y . Note that, by the above, 

7 Vn — rr n u n , I u n — (j n v n . 

The positive numbers a„ are called the singular values of T and v n , u n are called its 
singular vectors ( u n are also called the principal components of T). In fact, v n form 
an orthonormal basis for (ker t*T) 2l = (ker T) 1 - = im T*. and similarly u„ is an 
orthonormal basis for im T (Exercise 10.26(8) and Proposition 10.21). 

It follows that for any x e X and y e Y, 

x = Px + ^{v n ,x)v„, Tx = y'.a„{v„,x)u„, T*y = ^a n {u n , y)v n 

n n n 
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where P e B(X) is the orthogonal projection onto ker T. Indeed a stronger statement 
is true: 

T = ^a„u n v* 

n 

That is, the convergence is in norm, not just pointwise, the reason being 

N OO 

\\(T -'y' i a n u„v*)x\\ =1 y, <T n (u n , x)u n |[~ 

n= 1 n=N+l 

oo 

= y al\( v n ,x )\ 2 

n=N + 1 

< (max cr^) || Jc || 2 
n>N 

and iTiax /( >,v cr„ — > 0 as N — > oo since a n — > 0 as n — > oo. 

Let U be that operator representing a change of basis in im T from u n to some 
arbitrary basis (leaving the perpendicular space ker T* invariant), V a similar change 
of basis in im T* from v„. Then the ‘matrix’ of T with respect to v n and u n is 
D := U*TV\ as T v n := <J n u n and Tx := 0 forx e ker T, D is diagonal. □ 

Examples 15.24 

1 . The spectral theorem is often stated as: If a compact normal operator has “matrix” 
T with respect to a given orthonormal basis e n , then T = U DU~ l , where D is 
diagonal and U is the unitary change-of-basis operator that maps (e„) to (e„), the 
orthonormal basis of eigenvectors of T. 

2. The converse of the spectral theorem is true, i.e., defining the operator 

OO 

Tx := y X n {e n ,x)e n 
72=0 


in terms of an orthonormal basis, with A„ — > 0, gives a compact normal oper- 
ator — compact because it is the limit of finite-rank operators, normal because 
\\Tx\\ 2 = X n M 2 \(e n ,x)\ 2 = \\T*x\\ 2 . 

3. Given a compact normal operator in B(H), and any function / e C (a(T)), with 
/( 0) = 0, one can define the compact operator f (T ) by the formula 

OO 

f{T)x := y f(X n )(e„,x)e ll . 

71=0 


For example. 
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(a) y/T is compact when T is a self-adjoint compact operator with positive 
eigenvalues, 

(b) for any A ^ 0, there is a projection P\ f\(T), where f\ is a continuous 
function which takes the value 1 around A and 0 around all other eigenvalues. 

4. The projections P n to the eigenspaces X\ n of T commute and are orthogonal, so 
E n := P\ + ■ — |- P n is a projection onto X\ { H — • + X\ n (Exercise 8. 17(2)). The 
spectral decomposition can be rewritten as Tx = X „ A n 5E n x, where SE n := 
E n — E n - 1 = P n . This can be seen as a breakup of T — j rT(1) z(z — T)~ l dz 
into integrals on the disconnected components of the spectrum. 

5. According to SVD, any matrix T can be approximated by a n A„ where 

A n = u n v* and the sum is taken over the largest singular values. Typically, 
data from variables x\, ... ,x n is organized in the form of a matrix T with the 
rows representing the different variables and the columns the normalized mea- 
sured instances; the resulting u n associated with the largest singular values are 
linear combinations of the variables x n that account for the most variability in the 
data. 

6. If T e B(X) is compact normal, then the singular values of T are the absolute 
values of its non-zero eigenvalues. 

Proof Clearly, if Tx = Ax then T*Tx = X 2 3 x. Conversely, if T*Tx = fix 
(/i 0) then 

0 = (T*T - v)x = ^(|A„| 2 - fi){e n ,x)e n 


so [i = | A h | 2 for some n. 

Exercises 15.25 

(2 3\ 

1 . Find the singular values and vectors of I q 2 / anc * 

2. If S and T are commuting self-adjoint compact operators, then they are simulta- 
neously diagonalizable (Hint: consider S + iT). 

3. (a) Let T be an n x n self-adjoint matrix, with eigenvalues A 1 ^ ^ A„ 

(including repeated eigenvalues), and corresponding orthonormal eigenvec- 
tors v 1 , . . . , v„ . If M is a closed linear subspace, with orthogonal projection 
P, then the restriction of PT P to Mis also self-adjoint with eigenvalues, say, 
Ail < ■ • • < Atm, an d corresponding orthonormal eigenvectors u\, . . . , u m . 
Taking a unit vector x e [mj, ...,«,]] fl [r,-, ..., r„]] / 0, we get 

/ii < (x, Tx) < /r,- and A; < (jr, Tx) ^ A„. 



It follows that A,X/x,-. Similarly, take x e [[«;, .... M m ]]n|[Di, ... , u,- 
^4 0 to deduce m < A Combining the results we get 
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A/ ^ Hi ^ ^n—m+i • 

(b) Interlacing theorem : If the A:th row and column of a self-adjoint matrix are 
removed, the new eigenvalues Hi are interlaced with the old ones A; : 

At ^ Hi ^ ^2 ^ ^ A H — i ^ Hn— i ^ A„ 


4. Picard’s criterion: Suppose T e B(A r , F) is a compact operator on Hilbert spaces 
A", F, having singular values o„ and singular vectors v„, u„. In solving Tx = y, 
we find that ( u n , y) = er„ (v„, x ) for all n. A necessary condition is (u n , y) /o n e 
l 2 as well as y e (ker 7’* ) 1 . Thus the coefficients of y must ‘diminish faster’ 
than cr„ . 

5. Truncated Singular Value Decomposition (TSVD) The series solution 


x 


y 


n 


{ Un , y) 

Vn 


of T*T x — T*y need not converge in general. Even if it does, any small errors in 
(u n , y) are magnified as o n — > 0. In practice, the series is truncated at some stage 
to avoid this. The cutoff point is best taken when the error in y becomes appreciable 
compared to er„. Use the Tikhonov regularization method (Section 10.5) to derive 
another way of doing this (for the right choice of a), 

Z \&n\ 2 ( u n ,y ) 

„ \cr n \ z + a o n 

But any other weighting ']T n w n v n where w n vanishes sufficiently rapidly 
as o„ — y 0, is just as valid. 

6. It is instructive to compare with the case of solving the equation ( T — A)v = y 
where T is compact in B(H) and 0/Ae o(T) (the case A ^ o{T) is trivial). It 
has a solution O y e ker (T — A) -1 . That solution of minimum norm is then 

Z (e n ,y) ,, 

T - VO/A, 

n A ”~ A 

where the sum is taken over A„ ^ A, 0, and yo is the projection of y to ker T. 
There is no issue of convergence of the series as |A„ — A| P c > 0. 

7. * If T is a compact normal operator, then the iteration v n +\ := T v n /\\T v n \\ 
(starting from a generic vector vo) converges to an eigenvector of the largest 
eigenvalue, if this is unique and strictly positive. What happens otherwise? 
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Ideals of Compact Operators 

Another way of looking at the spectral theorem (or even the singular value decom- 
position), is the following: 

Proposition 15.26 


Any compact operator on a separable Hilbert space can be approximated 
by a square matrix. 

A compact normal operator on a separable complex Hilbert space can be 
approximated by a diagonalizable matrix. 


Proof An operator T e B(H) takes the matrix form, in terms of a countable ortho- 
normal basis e ; of 77, 


( P n TP n P n T(I-P n ) \ 


\(I — P n )T P n (/ - Pn)T(I - Pn) / 

where P n is the self-adjoint/orthogonal projection onto |[ei , .... e n ]j (Example 
15.14(1)). Note that for any vector x e 77, P n x — > x as n — > oo (Theorem 10.31). 
The claim is that when T is compact, the finite square matrices P n T P n converge to 
T . This is the same as claiming that the other three sub-matrices vanish as n -» oo. 

(7 — P„)T 0: Suppose, for contradiction, that there are unit vectors x n such 
that ||(7 — P n )T x n || A c > 0. Since 7’ is compact, there is a convergent subsequence 
T x n x, hence 


(7 - P n )Tx n = (7 - P n )x + (7 - P n )(Tx n -x)^0 


leads to an impossibility. 

(7 — P n )T P„ — »■ 0 and (7 — P n )T(I — P„) — > 0 now follow from | P„ | = 1 = 
|| 7 — P n || . Finally, T(l — P„) -> 0 is also true and follows from (7 — P n )T* — > 0, 
since T* is also a compact operator (Proposition 11.31). 

For a compact normal operator, the orthonormal basis e; can be chosen to consist 
of the eigenvectors of T by the Spectral Theorem, in which case P n T P n is a diagonal 
matrix 

n 

P n TP n =2> e/ e ; *. 

i '= 1 
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Proposition 15.27 


The compact operators of finite rank acting on a Hilbert space 77 form 
a simple *- ideal K.f(H ), which is contained in every non-zero ideal of B{H). 

The closure of ICf(H) in B(H) is the *-ideal of compact operators 7C( //). 


Proof The facts that the sum of compact operators, the product of a compact operator 
with any other operator, and the adjoint of a compact operator, are compact have 
already been proved earlier (Propositions 11.9 and 11.31), so /C(77 ) is a *-ideal 
in B(H). 

Similarly, it is not difficult to show that the sum of two finite-rank operators, and 
the product (left or right) of a finite-rank operator with any other operator, are again 
finite-rank. The details are left to the reader. 

Let X be an ideal in B(H ) which contains a non-zero operator S. There exist non- 
zero vectors a, b such that Sa — b. For any vectors x,y f 0, define the operator 
E xy := xy*/\\y\\ 2 , so that E xy y — x, but E xy u — 0 whenever u _L y. The operator 
E xb SEay has precisely the same effect 

E xb SE ay y = E xb Sa = E xb b — x, E xb SE ay u — 0 (u _L V ) . 

so E xy = E x bSE ay e X. Now let T be any operator on H . If e \ , . . . , e n are linearly 
independent in (ker T then Te\, . . . , T e n remain linearly independent in im 7’, 
for 


T (Zi a i e i ) = Hi a i Tei = 0 => ^ a, e/ e ker T fl (ker T) 1 - — 0 

i 

=y a; = 0, i = 1, . . . , N. 

Thus, if T is of finite-rank then (ker 7’ ) 1 is finite-dimensional and has a finite ortho- 
normal basis e\ , . . . , e/y, say, extended to an orthonormal basis for all of H. Inciden- 
tally, this shows that T* is also of finite rank, since im T* = (ker 7’ ) ^ . Given any 
vector x = ]>j ); a n e n e 77, 


Tx = T(^ 

n 


N N 

Qn^n) = ^ ' &n T = ^ ^ ^Te n ,e n &n 

n = 1 n = 1 


N 

^"! Ere n ,e n X 
n= 1 


so T is a linear combination of operators Ere n .e n and belongs to X. We have shown 
that 7Cf(77) C X and /Cf(77) is closed under adjoints. 

In particular /Cf(77) contains no non-zero ideals; we say it is simple. That the 
closure of 7Cf(77) is 7C(77) is essentially the content of the previous proposition: More 
precisely, recall that the image of a compact operator is separable, so M := im T 
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has a countable basis (e,). Let P n be the orthogonal projection onto |]ei , . . . , e „ ] . 

Then, as in the proof of the previous proposition, the finite-rank operators P n T 

converge to T. □ 

Examples 15.28 

1. The ideal of compact operators, being the closure KL(H) = K-f(H), is contained 
in every closed ideal of B(H). 

2. The algebra of matrices B( C N ) — ICf(C n ) = IC( C N ) is simple. 

3. O The above argument can be extended to show, more generally, that compact 
operators on a Banach space with a Schauder basis can be approximated by finite- 
rank operators. Spaces for which this is true are said to have the “approximation 
property”; even separable spaces may fail to have this property [41]. 


Hilbert-Schmidt Operators 
Definition 15.29 


The trace of an operator T on a Hilbert space with an orthonormal basis e n , 
is, when finite, 

trCT) 

n 

A Hilbert-Schmidt operator is one such that \x{T*T) = y' „ || Te n || 2 is finite. 


As defined, the trace of an operator can depend on the choice of orthonormal 
basis. But for a Hilbert-Schmidt operator, tr(7’*7’) is well-defined as the proof of the 
next proposition shows: 

Proposition 15.30 

If the right-hand traces exist, 

tr(S + T) = tr(S) + tr(T), tr(AT) = A tr(T), tr(T*) = tr(T). 

If S, T are Hilbert-Schmidt, then tr(5T) = tr(7’ .S’). 


Proof The identities tr {S + T) — trfS) +tr(T) and tr(AT) = A tr(T) follow easily 
from the linearity of the inner product and summation, while 


tr(T*) = X <e "’ T * e "> = S = S re "> = tr(r )' 
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Let e„ and e m be orthonormal bases for the Hilbert space H\ then T e n = 
y.m {('m . T e n )e m and ST e n — f ~ T e n ) Se m , so 

tr(ST) = y (e„, ST e n ) = y (e m , Te„){e n , Se m ) = y (e m , TSe m ), (15.1) 

n n,m m 

exchanging the order of summation. This would be justified if the convergence is 
absolute, which is the case when S* and T are Hilbert-Schmidt, 


y. I (e m , Te n ){e n , Se m )\ < /y \{e m , Te n ) | 2 /y |(e„, Se m )| 2 

Y n,m y n ’ m 

= /y ||T e|I || 2 y ||S*e„|| 2 , (15.2) 

y « n 

applying the Cauchy-Schwarz inequality and Parseval’s identity. So, putting S — T* 
and e n = e n in (15.1) shows that tr (T*T) = tr (TT*), when T is Hilbert-Schmidt, 
i.e., T* is also Hilbert-Schmidt. This, in turn, implies that when S and T are Hilbert- 
Schmidt, (15.2) and (15.1) are satisfied, so tr(T S ) = tr (ST) (in particular tr (T*T)) 
is independent of the orthonormal basis. □ 

Theorem 15.31 


The Hilbert-Schmidt operators of B(H) form a Hilbert space HS , with 
inner product 

(S, T) ns := tr(S*T) = y (Se n , Te n ), 

n 

which is a * -ideal of compact operators, and 

\\ T \\ < ll^llws- II^H-hs ^ IIS|llir|| W5 . 


Proof Lete„ be an orthonormal basis for H. First note that || T \ | 7^5 := *J(T, T)^ = 
vTr (T*T) is finite for Hilbert-Schmidt operators. 

(i) We have remarked in the preceding proposition that if T e HS then T* e HS, 
and 

liniws = Acrr*) = Ttr Jt*t) = imi H5 . 

The product (S, T) tr (S*T) is finite and independent of the choice of ortho- 
normal basis when .S’, T e HS. by (15.1) and ( 1 5.2). Moreover, both of the following 
traces are finite, 
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tr(S + T)*(S + 7) = tr(5*5) + tr (S*T) + tr (7*S) + tr (7*7) 
tr(A7)*(A7) =|A| 2 tr(7*7), 

so that HS is a vector space. 

Linearity and ‘symmetry’ of the product follow from 

{S, 7i + T 2 ) = tr {S*T X + S*T 2 ) = tr (S*7j) + tr(S*7 2 ) = ( S , 7j) + {S, T 2 ), 
(. S , XT) = tr(S*A7) = X tr(, S*T) = X{S , 7), 

(7, S) = tr(7*S) = tr(S*7)* = tr (S*T) = {S, T). 


That || T || ^ II 7 1|^5 (and hence ||7||%s = 0 =>■ 7 = 0) follows from 

||7;t|| = || z {&n ■> %) T || ^ ^ ' I {^n-> %) I II T II 




Zi 


[e„,x) 


Y^ Te '^ = ||x|| ||7|| W5 . 


■, ■) is therefore a legitimate inner product on HS. 

Finally, TtS is an ideal of B(H), since for any S e B{H) and 7 e HS, 


|57| 


HS 


= Y \\STe n \\ 2 < ^||S|| 2 ||7eJ 2 = ||S|| 2 ||7| 


2 

HS’ 


and ||7S|| W5 = ||(7S)*llws < l|S*lll|7*|| W 5 = 


II WS- 


(ii) Hilbert-Schmidt operators are compact : Given 7 e HS, define the finite-rank 

7 e n if n S N 
0 if n > N ' 


operator 7^ by T^e n := 


OO CO 

l|T-Tv|| 2 ^ |7-7 w || 2 iiS = ^||(7-7 w ) e „|| 2 = Y II2>„|| 2 ^0 aslV^oo. 

n= 1 n=N+\ 

7 is thus the limit of finite-rank operators, making it compact (Proposition 1 1.9). 

(iii) The space HS is complete in the HS-norm (but not necessarily in the operator 
norm): let (7„) be an TLS-Cauchy sequence 

II T n - T m \ |^ 5 = ~Yj \\(T„ - T m )ei || 2 -> 0 as n,m-> oo, 
i 

then it is a Cauchy sequence in the operator norm, and thus T n —> 7 in B(H). 
But writing the Cauchy condition in a slightly different way, the sequences x n := 
(II (7„ — 7)e,- 1|) form a Cauchy sequence in £ 2 , 
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Xn - X m \\j 2 = X |lK r » - T)e ‘W ~ ^ T >" - 7>,||| 2 < X I' (r " - 7 ’"') e 'H 2 ->• °- 


as n , m — >■ oo ; so 
with || T n ej — Tei 


oc; so x„ converges to some sequence (a,- ) e £ 2 . Combining T n e, — »■ 7’e; 
— 7’e,' | — >■ a, for all i, each a,- must be 0, and 



so T n -» T in 7 i<S, and 7 1 e 77<S since ||7’||-^ < 5 ^ ||7’ — T n \\y^g + ||r n ||^5 < oo. □ 

Having established a theory of Hilbert-Schmidt operators, we now exhibit an 
important specific example: 

Theorem 15.32 

If k e L 2 (M 2 ), then the operator on L 2 (M) 



is Hilbert-Schmidt with ||7’|| Wt $ = ||£|| L 2 . 

Proof Let e„(x) be any orthonormal basis for L 2 (R). Then any function of x in 
L 2 (K) can be written as a sum of these basis functions. Analogously any function 
of two variables x, y in L 2 (R 2 ) can be written as a sum (convergent in L 2 (R 2 )) 



by first fixing y and expanding in terms of e n (x) and then treating the result as a 
function of y. Write ® e m for the basis functions (x, y) m- e n (x)e m (y). They are 
orthonormal, since 



{e n <S> e m , e„' <g) e m >) = 


= JJ e n (x)e m (y)e n '(x)e m '(y)dxdy 

— f'n ! ' e n ) ^ m ) — 1 ( \i' n fu'tn ■ 


By Parseval’s identity ||£|| 2 2 = ff \k(x, y)| 2 dx dy = I a „, m \ 2 . Clearly, 


(e„, Te m ) = e n (x)k(x,y)e m (y)dxdy = (e m ® e„, *) i 2 (R 2 ) = a m ,„, 


SO 
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imi hs = Z n re »n 2 = Z k«»- re '«)i 2 = Z i^i 2 = 11^11^2- 

n n,m n,m 

Examples 15.33 

1. ► For square matrices T = [7L], S = [5;,], 

tr ^ = X, Tu, \\T\\ ns = y /X i j\Tij\ 2 , (S,T) ns = Zi j SijT ij . 

2. More generally, for any Hilbert space, and using Parseval’s identity, 

WTfns = Zll 7 ’ e 'H 2 = Zl< e v-^)l 2 - 

i i,j 


3. ► For a Hilbert-Schmidt normal operator, ||F||-^5 = |A„| 2 , where A„ are 

the eigenvalues of T . In this case it is evident that || 2" || 7^^ ^ max„ |A„| = ||r||. 

4. Find the eigenvalues and eigenfunctions of the integral operator on L 2 [0, 1] with 

y(l -x) 0 ^ y < * ^ 1 

x(l — y) 0 ^ x ^ y ^ 1 

Solution. The operator is Hilbert-Schmidt since \k(x, y)| 1. The eigenvalue 

equation is 


kernel k(x, y) := 


X 1 

J y(l ~x)f(y) dy + J x(l — y)f (y) dy = A/(x). 

0 X 

The eigenfunctions can be assumed to be differentiable, essentially because they 
are integrals. Differentiating gives 


x(l -x)f(x) - J yf(y) dy -x(l - x)/(x) + /(I - y)/(y)dy = A /'(*), 

0 X 

and again, — x/(x) — (1 — x)/(x) = A/"(x), 
f"(x) + \f{x) = 0, /(0) = 0 = /(l). 

The solutions of this differential equation are the eigenfunctions /„ (x) = 
sin(/r7rx) with eigenvalues A„ = 1/(h 2 7t 2 ). 

5. A traceless operator in B (C ,v ) has a matrix with a zero diagonal, with respect to 
some orthonormal basis. 

Proof Let A be an N x N matrix with tr A = 0. The proof is by induction on N. 
Since the numerical range of A is convex, 


1 

0 = — tr A 
N 


1 

N 


N 

Z A « e ^(A) 

n= 1 
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where A„ are the eigenvalues of A. So there is a unit vector u such that (u, Au) = 0. 
The matrix restricted to u 2 -, A := A\ u ±, is still traceless 

0 = tr A = tr A + (a, Au) — tr A. 


Therefore, by induction, there is an orthonormal basis ei, . . . , ejv-i of u 1 in 
which A has zero diagonal, i.e., (c, , Ae, ) = 0. This basis, together with u is the 
required basis for the whole /V-dimensional space. 

6. O There is a correspondence between various ideals of compact operators and 
the sequence spaces of their singular values (A„): 


Finite-rank operators 


(A«) G C 00 

Trace-class operators 

Tr (H) 

(Ah) G f 1 

Hilbert-Schmidt operators 

HS(H) 

(A„) G l 2 

Compact operators 

IC(H) 

(Ah) G C0 

Bounded operators 

B(H) 

(X n ) G l°° 


where the set of trace-class operators has been added to complete the pic- 
ture (Exercise 15.49(11)). More generally, the Schatten-von Neumann class of 
operators C p corresponds to (A„ ) e l p . The analogy goes deeper than this: 
K,(H)* = Tr (H) and Tr(//)* = B(H) (via the functionals T i-s* tr (ST)). 


Exercises 15.34 


1. (a) (S*. T*) ns = (T, S) ns , 

(b) (RT*, S) HS = ( R , ST) ns = ( S*R , T) hs . 

2. The closest number to an n x n matrix T (in the TlAS-norm) is tr (T)/n. (Hint: 
X — T -LI.) 

3. The map x M x , where M x y := xy, embeds €~ into TLS(t 2 ) (isometrically). 

More generally, if x n e H satisfy ||x„|| 2 < oo, then T := ^ n x n e* is 
Hilbert-Schmidt with HTH^ = ||x„|| 2 . 

4. The Volterra operator on L 2 [0, 1], V f (x ) := f Q l f is Hilbert-Schmidt (without 
any eigenvalues). 

5. If k(x, y) = k(x — y) for a real function k(x) e L 2 [ 0, 1] (Example 8.6(5)), then 
Tf k* f is Hilbert-Schmidt, with eigenvalues kin). 


6. Find the eigenfunctions and eigenvalues of the TiAS-compact self-adjoint operators 
T f := fo k(x, y)f (y) dy (on L 2 [0, 1]), where 


(a) k(x, y) := x + y, 

(b) k(x, y) := ' 1 1 


x < y ^ 1 

0 0 < y < 1 — x 


(c) k(x, y) := min(x, y)\ deduce that = 55 and ^ = 5o- 
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7. 


In the original Fredholm theory, it was proved under certain hypotheses that the 
equation 

b 


fix) + 


k(x, y)f(y ) dv = gix) 


a 


either has a unique solution, or else the same equation with g — 0 admits a 
finite number of linearly independent solutions. Show this for /, <7 e L 2 (M), 
k e L 2 (R 2 ), using Proposition 14.17. 


15.4 Representation Theorems 

We return to a general unital C*-algebra X and recover some of the previous propo- 
sitions in this setting. The aim is to widen the functional calculus for normal elements 
and to prove that X is embedded in B(H ) for some Hilbert space H. 

Proposition 15.35 


For any f e S(X), T e X, 

tyT* = PpT, T* = T*. 


Proof If A is self-adjoint and t e R, then 
|| A + if || 2 = || (A + it)* {A + it)\\ 

= || A 2 + r 2 || < || A || 2 + f 2 

(As a matter of fact, equality holds as the accom- 
panying diagram shows.) 

Writing <j)A =: a + ib, we find 


er(A + it) 


it 

o~) 


|Z?-f-t| ^ | cl -{- ib -(- i 1 1 — \4>(A + it)\ ^ 1 1 A -{- i t \ \ ^ \/|| A || “ T t ~ 
(It + b)b < || A || 2 fort e K 


so b = 0 and fA e K. Note that A(A) C cr(A) C 5(A) C K. More generally, for 
any T = A + iB e X, with A, B self-adjoint, 


<fT* = <)>(A - i B) = fA - ifB = fA + i(j>B = (f>T . 


In particular, every ip e A is automatically a *- morph ism. and 
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Theorem 15.36 The Functional Calculus for Normal Elements 


When T is normal, T := C[7', T*] is a commutative closed *-subaIgebra 
of X, isometrically ^-isomorphic to C(<r(T)). 

The identity f(T) = f o T defines a normal element f(T) whenever 
/ e C(a(T)) ; then a(f(T)) = f(a(T)). 


Proof T is a commutative closed *-subalgebra of X : Since T is normal, j' n (T*') m = 
(T*)" 1 T" (by induction), so it should be obvious that (i) any polynomial in T and 
T* can be written uniquely in the form m a n m T n (T*) m , (ii) the product (and 
addition) of two polynomials in T and T* is another polynomial, (iii) this product 
commutes, and (iv) the involute of a polynomial p(T , T*) remains in T, 

p(T, T*f = (^a„, m r n (7’*) m T = ’Y J offfT m {T*) n e C[7\ T*]. 

n,m n,m 


C[T, T*] is thus a commutative *-subalgebra. The closure of such a subalgebra in X 
remains a commutative *-subalgebra (Prove!). Note that T is obviously separable. 

The spectrum of S € y, with respect to a closed *-subalgebra y C X , is 
a(S ): Clearly, if S (or S — A) is invertible in y, it remains so in X. Conversely, 
if S is invertible in X, then so are S*, S*S and SS*. But .S'* S is self-adjoint, with 
a real spectrum (in y and X), hence S*S + i/n is invertible in y. As y is closed 
and (S*S + i/n)~ l — > (S*5) _1 in X, as n oo, we can deduce (S*S)~ l e y. 
Similarly (.S.S*) - 1 e y , implying S is invertible in y (Exercise 15.3(4)). 

T : Aj — »■ cr(T) is a homeomorphism: (Aj is the character space of T.) T is 
1-1 since suppose T (f>\) = T (fjf) for some e At, i.e., ij)\T — iffT ■ Then 

V’i t* = = fyr = i/j 2 t* 

A V’t P(T, T*) = ^1 

n,m 

= Y J a ^'n(^-T) n WTr = 4’2P(T, T*) 
n,m 


for any polynomial p; finally, by continuity of ip\ and f 2 , f\ 5 = ?/> 2 .S for all S e T, 
proving = tp 2 - That T is onto was proved in Theorem 14.38. It is continuous 
because 

ipn ^ tp =*> T(ip n ) = tp„T ->■ ipT = T(ip). 

So T is a homeomorphism since is a compact metric space (Proposition 6.17 
and Example 14.35(8)). Hence any z e oCT) corresponds uniquely to some ip e At 
via z = T(ip) = ipT. 
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The Gelfand transformQ : T -> C(At ) = C(cr(T)) is an isometric *- 
isomorphism'. Recall that Q is a Banach algebra morphism (Theorem 14.37). In 
a commutative C*-algebra such as T, every element S e T is normal, so ||S|| C = 
p(S ) = || .S' | (Theorem 14.38); furthermore S* — S*, and the Gelfand transform is 
an isometric ^-embedding. 

In fact it is onto: for any polynomial p, p(T. T*) is mapped by it to p(z, z) when 
regarded as a function on o(T). By the Stone- WeierstraB theorem, these polynomials 
are dense in C(o(T)). Hence, since Q is isometric, it extends to T — > C(Aj). 

The continuous function calculus: The correspondence between elements in T and 
functions in C ( A -f ) allows us to extend the analytic function calculus^ established 
earlier. For any continuous function f e CioiT)), the composition / o T : At —> C 
corresponds to some (normal) element in T which is denoted by f(T). By this 
definition, f(T ) = / o T . The following identities are true because they mirror the 
same properties in C( At), 

(/ + 9)(T) = f(T) + g(T ), (A f)(T) = A f(T), ( fg)(T ) = f(T)g(T), f(T) = f(T)*. 

Finally ||/(T)|| = ||/||c is due to Q being an isometry and g o f(T) — g(f(T)) 
follows after 


a(f(T)) = im fCT) = im / o T = f im T = f(a(T)). □ 

Examples 15.37 

1 . To take a simple example, consider a 2 x 2 diagonalizable matrix T with distinct 
eigenvalues A, and corresponding orthonormal eigenvectors Vj,i = 1 , 2. Its char- 
acter space At consists of the two morphisms 'if S := (u,, Sv,) for S e T . The 
Gelfand transform takes T to (Ai, A 2 ); any other matrix f (T) is simultaneously 
‘diagonalized’ to (/(Ai), /( A 2 )). 

2. ► For any elements Si, S 2 € T, 

cr(Si + S 2 ) C cr(Si) + er(S 2 ), (t(SiS 2 ) c cr(Si)cr(S 2 ). 

Proof As T is commutative, Theorem 14.38 shows that <r(S) = A 7 - S for any 
S e T. Hence the statements follow from Exercise 14.40(10b)). 

3. If S, T are commuting normal elements, and / e C(cr(S)), g e C(o(T)), then 
f(S)g(T) = g(T)f(S). 

Proof Take polynomials p and q, in z and z*, then p(S, S*)q(T, T*) = q(T , T*) 
p(S , S*) since they are sums of terms of the form 

aS n s *m T i T *j = aT i T j* S n S * m 


by an application of Fuglede’s theorem. Taking the limit of polynomials converg- 
ing to /, g (by the Stone- Weierstrass theorem) gives the required result. 
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4. The self-adjoint elements of T correspond to the real-valued functions / e 
C(At) and form a real Banach algebra, while the unitary elements correspond 
to functions with unit absolute value, |/| = 1. 

5. Every commutative unital C*-algebra, is isometrically ^-isomorphic to C( A), 
via the Gelfand map. The algebras C(K), with K a compact metric space, are 
therefore typical separable commutative C* -algebras. 

Proposition 15.38 


For T normal, 

T is unitary a{T) c e iR , 
T is self-adjoint <£> a{T) c R. 


Proof (i) The spectrum of a unitary element U must lie in the unit closed ball since 
||I7 1| = 1. Now, U - A = t/(l - XU*) and \\XU*\\ = |A|||£/*|| = |A|; so |A| < 1 
implies 1 — At/*, and thus U — A, are invertible (Theorem 13.20). 

(Equivalently, if A e er (U) then A -1 e a(U~ l ) = cr(U*) — cr(U)* and so both 
| A | and 1/|A| are less than 1.) 

(ii) We have already seen that S(T) c R when T is self-adjoint, and S(T) includes 
<j(T). (Alternatively, e' T is unitary (Example 15.5(11)) and the spectral mapping 
theorem gives e la ^ T) = a(e l1 ) C e ,R . But \ e ‘( a + lb '>\ = e ~ b i s 1 only when b = 0, 
from which follows that a(T) C R.) 

(iii) For the converses, let T be normal with a (T) C R. Writing it as A + iB with 
A, B commuting self-adjoint, we see that i B — T — A, so 

cr(iB ) C a(T) + <t(— A) C R, Example 2 above 

yet cr(i B) = ia{B) C (R. Thus a(B) = { 0 }, B — 0, and T = A is self-adjoint. 

(Alternatively, we can work with S: if T is normal and a( T) is real, then S(T) c 
R; for any feS, <j>(T - T*) = <f>T -fiT = 0, hence T - T* = 0.) 

(iv) If T is normal with cr(7’) C e' R , then 

a(T*T) C a(T*)a(T) = a(T)*a(T) c e' R . 

As T*T is self-adjoint and has a real spectrum, that leaves only ±1 as possible 
spectral values. But 1 + T*T is invertible, otherwise there is a ip e At such that 

-1 = = il;T*^T = \ipT\ 2 , 

a contradiction. So a(T*T) = {!},! = T*T = TT* and T is unitary. □ 
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Exercises 15.39 

1. Find an example of an operator T having a real spectrum, without T being 
self-adjoint. 

2. If J is a *-morphism and T is normal, then J ( f(T)) — f(J(T)) (first prove, 
for any polynomial p, J(p(T, T*)) = p(J(T), J(T)*)). 

3. ► InaC*-algebra, S(T) — 0 =>• T = 0(writer = A + iB). We say that <S (40 
separates points of X: if T ^ S, then there is a cj> e S such that <pT ^ <j>S. 

4. Suppose a C*-algebra has two involutions, * and * (with the same norm). 
Show that T* = T* for all T — the involution is unique. (Hint: (p(T*) = (j)T 
= <KT*).) 

5. Every normal cyclic element is unitary. In particular, the normal elements of a 
finite subgroup of Q (X) are unitary. 

6. The Fourier transform^ 7 : L 2 (R) — » L 2 (R) is unitary; in fact it is cyclic 
T' x = 1, so that it has four eigenvalues ±1, ±i. Verify that the following are 
eigenfunctions: e~^ x ,xe~ nx ~, (47rx 2 — l)e~ wx ~, (47rx 3 — 3x)e~ nx . 

7. A normal T such that ||r|| = 1 = ||7’ _1 || is unitary. 

8. Normal idempotents are self-adjoint. A normal element T with a (T) C { (), 1 } 
is an idempotent, e.g. when T is normal and T n+1 = T n for some integer n. 

9. Suppose M is a closed subspace of a Hilbert space which is invariant under a 
group of unitary operators. Show that M 1 is also invariant. 

10. If T n are self-adjoint operators and T n — T then T is self-adjoint. 


Positive Self-Adjoint Elements 

For T, S self-adjoint, let T S be defined to mean a(S—T) c [0, oo] . Equivalently, 
since 5(40(5 — T ) is the closed convex hull of a(S — T) (Proposition 15.8), 

T < S o Vtj) e S(X), 4>T 0 <M- 


Proposition 15.40 


The self-adjoint elements form an ordered real Banach space, such that 

T ^ S AND R ^ Q =>■ T + R ^ S + Q, 

T <5 =>■ R*TR ^ R*SR WR e X. 
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Proof First note that, by the definition, T ^ S <£> 0 ^ S — T <£> T — S ^ 0 
(44> — S ^ —T), so we might as well consider A := S—T ^ 0 and B := Q — R ^ 0 
in proving some of the assertions. 

(i) It is trivially true that self-adjoint elements form a real vector subspace 

(S + T)* = S* + T* = S+ T, (XT)* = XT* = XT, VAeR. 

If T n — > T with T* = T n , then in the limit, T* = T , so the subspace is closed. 

(ii) That T ^ T is immediate from er(0) = { 0 }. For anti-symmetry, note that 

0 < A < 0 => <r(A) = {0} => ||A||= p(A) = 0 => A = 0, 
so S^T^S^>T = S. 


(iii) To facilitate the rest of the proof, we demonstrate 

a ^ T ^ b cr(T) C [a, b] (15.3) 


in two parts, 


a < T <$■ a(T) — a — cr(T — a) c [0, oo] O ct(T) C [a, oo] 

T ^ b AX ct(T) — b — a(T — b) C ]— oo, 0] O cr(T) C ]— oo, /?]. 

In particular, note that T f p(T) = || T|| and that if 0 ^ T ^ b then p(T) f b. 

(iv) A, B A 0 =>• A + B ^ 0: In general, 

C + D ^ ||C + D || ^ ||C*]| + || Z? || = p(C) + p(D). 

Let a := p(A), then 0 ^ A ^ a can be rewritten as 0 f a — A f a and hence 

p(a — A) ^ a. Similarly p(b — B) ^ b := p(B), so (a — A) + (b — B) ^ a + b, or 

equivalently, A + B f 0. 

(v) A special case of this shows transitivity of the order relation, 

T < S < R 0 s; (R - S) + (S - T) = R - T => T < R 

(vi) We are not at this stage able to prove the full product-inequality rule as claimed 

in the proposition. The proof is deferred to the next proposition. Here we show 
only the simple case when R is scalar, i.e., if A 73 0 and A = S — 7’ A 0, then 
cr(AA) = Act (A) c R+. D 

The continuous functional calculus allows us to extend the domain of all contin- 
uous real functions / : R -* R to the set of self-adjoint elements. Two functions in 
particular stand out: 
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(i) the positive square root \[7\ when A ^ 0, satisfying (VA) 2 3 4 = A = -n/a 2 , and 

x whenx ^ 0 


(ii) A + for all A self-adjoint, from the function x + := 


0 whenx < 0 


; similarly 


A- from x- := (— x)+. Their sum then gives |A|, which corresponds to the 
functions i->- \x\. 


Examples 15.41 

1. (a) If-r < S < T then ||Sj| sj ||7’||. 

(b) If 0 < a ^ T b then T is invertible and b~ l ^ T~ l ^ a -1 . 

(c) If ST > 0 then TS > 0. 

(d) If S, T ^ 0 and ST is self-adjoint, then ST f 0. In particular, T ^ 0 =>■ 
T n > 0. 

(e) If S, i < T n and S„ — > S, T n — > T, then S ^ T . 

Proof (a) — 1| T || < S < ||r||, so <t(5) c [-||J||, || T\\] and ||Sj| = p(S) < ||T||. 

(b) cr(T) C [a, b] does not include 0; a(T~ [ ) = cr(T)~ l C [b~ l , a -1 ]. 

(c) cr(r S) is the same as cr(ST) except possibly for the inclusion or exclusion 
of 0. In any case cr(ST) C R+ <£> a(T S) C R+. 

(d) Recall that ST is self-adjoint exactly when ST = TS. So, by Exer- 
cise 14.40(17)), a(ST) C <j(S)cr(T) c R+. 

(e) Let A n := T n — S n f 0 and A n — > A := T — 5. Then 0 ^ (j)A n — »• <f>A for 
any <f> e S, so 5(A) c [0, oo], 

2. The set of positive elements is a closed convex ‘cone’ (meaning T ^ Oand 
A ^ 0 =>■ AT ^ 0), with non-empty interior in the real Banach space of 
self-adjoints. 

Proof The only non-trivial statement is that the cone contains an open set of 
self-adjoints, namely the unit ball around 1: If A is self-adjoint and ||A|| < 1 
then — l^A^l,sol + A^0. 

3. Positive continuous functions /: R. — > M + give positive elements /(A) ^ 0 
for A self-adjoint. For example, A+, A_, |A|, and A 2 are all positive. More 
generally, for any normal operator T and / e C(C, R + ), f(T) ^ 0. 


Proof By the functional calculus, cr(f (T)) — fa(T) C [0, oo]. 

4. Every self-adjoint element decomposes into two positive elements 

(a) A = A + — A_, | A | = A+ + A_, 

(b) A+A_ = 0, A±|A| = A|, A±A = ±A\, and A+, A_, A and |A| all 
commute with each other, 
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Proof The identities x = x+ — X-, |x| — x+ + x x+x- = 0, x±\x\ = x±, 
x±x = ±x± imply (a) and (b). Moreover, A + A- = A+ ^ 0, |A| — A+ = 
A + — A = A_ ^ 0. Finally, <r(|A|) = { |A| : A e cr(A) } is bounded above by 
p(A) — || A || . 

5. By the spectral mapping theorem, the spectral values of \Ta are the positive 
square roots of those of A ^ 0. Overall there may be an infinite number of 

square roots of A, e.g. for any z e C, 

6. o ^ s < r =>• Vf. 

Proof IfT is invertible, then T - 2 5T _ 2 ^ 1 (Proposition 15.40), so ||5'2 7’ _ 2 || 
= \\T~^ ST~^\\ ^ 1, from which follows T~* S?T~i ^ 1 and Sz ^ T 2 . 

Proposition 15.42 


/ z 1 + z V _ / 1 0\ 
\l-z -z J - \0l ) 


For 

any T e X 

and f e S(X), 

(i) 

T*T ^ 0, 


(ii) 

V 

o 

$ 

T = R*R , for some R e X, 

(iii) 

(S,T) := 

<f)(S*T) gives a semi-inner product, 

(iv) 

ms*T)\ 2 

s; <j)(S*S)<j)(T*T), \fT\ 2 ^ <f>(T*T), 

(v) 

\<KS*TS)\ 



Proof (i) T*T is certainly self-adjoint, and can be decomposed as T*T = A — B 
where A, B ^ 0, AB — BA = 0 (Example 4b above). Now 

(TB)*(TB) = BT*TB = B(A - B)B = -B 3 < 0 

and hence (T B){T B)* ^ 0 (Examples 15.41 (lc)). Writing T B — C + iD, with 
C, D self-adjoint, we find 

0 ^ 2(C 2 + D 2 ) = (TB)*(TB) + (T B)(T B)* ^ 0 
O^C 2 = -D 2 ^0 
C = 0 = D 


so T B = 0. But then, 0 = ( TB)*(TB ) = —B 3 forces B = 0 and T*T = 
A ^ 0. 

This allows us to conclude the proof of Proposition 15.40(vi). If T f S let 
A := S - T ^ 0, so for any R e X, R*AR = (V AR)*(VAR ) ^ 0, i.e., 
R*TR < R*SR. 

(ii) Conversely, if T is positive, let /? := \ff ^ 0, so R* R — R 2 = T . 
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(iii) The product satisfies the following inner-product axioms, 


(S, XTi + fiT 2 ) = <t>{\ST\ + n ST 2 ) = A (S, T x ) + n(S, T 2 ), 
{ T , S) = <f>(T*S) = <j)(S*T)* = JSTT), 

(T, T) = (j)(T*T) ^ 0 since T*T ^ 0. 


However, it need not be definite, i.e., <j>(T*T) = 0 may be possible without T — 0. 

(iv) This is the Cauchy-Schwarz inequality, which is valid even for semi-definite 
inner products (Example 10.10(17)). In particular, taking S = I gives the second 
inequality. 

(v) As <f> preserves inequalities, 


T*T < \\T*T\\ = \\T\\ 2 =► S*T*TS < \\T\\ 2 S*S 

=>• f(S*T*TS) < ^(5*5)||r|| 2 . 
.-. \f(S*(TS))\ 2 < (/)(S*S)(f>(S*T*TS) by (iv), 

< (/)(S*S) 2 \\T\\ 2 


Proposition 15.43 

If / : X -»• y is an algebraic *-morphism between C*-algebras, then it is 
continuous with || 7 1| = 1 , and preserves < . 

If 7 is also 1-1, then it is isometric. 


By an algebraic ^-morphism is meant a map which preserves +, -, 1, and *. 

Proof If A ^ 0, then A = R*R and J (A) = J(R)*J(R) ^ 0. Thus 7 preserves the 
order of self-adjoint elements. 


5 s; T => J{T - S) > 0 7(S) f J(T). 


Now for any T (noting that 7(1) = 1), 

0^ T*T < \\T\\ 2 , 

0 < J(T*T) < ||T|| 2 , 

.-. ||/(T)|| = ||/(r)*/(7’)||2 = ||/(r*r)||2 < ||T||. 

If 7 is 1-1, then one can form the ‘inverse’ J~ l : ini 7 — * X. It is automatically 
an algebraic *-morphism (check!), for example, for any S e im 7, 

J~\S *) = J~\JT)* = J~ l J(T*) = T* = (J~\JT))* = ( J~ l S )*, 


and so || 7 1 (5") || ^ ||5||. Thus || 7" || ^ ||7(7')|| ^ ||r|| as required. 
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(Alternatively, defining |||7'||| := ||/(7 , )||y gives a C*-norm on X. But there 
can only be one C*-norm (Exercise 15.10(6)), so ./ is an isometry and im J 
is closed.) □ 

Exercises 15.44 

1 . 0 R 1 (as self-adjoint elements), and the order relation of R is subsumed in that 
of the self-adjoint elements. Similarly, in C[0, I ] , / R <j <$■ Vx, /(x) R g(x). 

2. /) < (li)in B(C 2 ). Note that T R S does not mean “ a(T ) R 
cr(Sy’ in general. 

3. (a) A diagonal matrix is positive when all its diagonal coefficients are real and 

positive. 

(b) If the coefficients of a real symmetric matrix are positive, it does not follow 
that it is positive: Vi, j, A, ; - R 0 ^ A R 0. 

(c) But if a real symmetric matrix is dominated by its positive diagonal, meaning 
An R I -A/y | , then A R 0 (Gershgorin’s theorem (Examples 14.10(6)). 

4. Show Re(7) R 0 RR ReS(T) R 0. 

5. The similarity between self-adjoints and real numbers is striking. But not every 
property about inequalities of real numbers carries through to self-adjoints: 

(a) Not every two self-adjoints S and T are comparable, e.g. T := 
satisfies neither T R 0 nor 7’ R 0; 

(b) 0 R S R T does not imply S 2 R T 2 (unless S, T commute), e.g. S := 

(n)'-(n) 

6. In B(H), S R T 4R (x, Sx) R (x, T x) for all x e H. In particular, S*S R 
T*T ^ ||Sx|| R ||rx|| for all x e H (e.g. T*T R 0); deduce 

(a) If T is compact then so is .S', 

(b) If T is Hilbert-Schmidt, then so is S , 

(c) For self-adjoint projections in B(H), P R Q when im P C im Q. 

7. Prove S R T =S R*SR R R*TR for all R , in B(H). 

8. In B(H), if T R 0 then ((x, y)) := (x, Ty) is “almost” an inner product on //, 
except that it need not be definite; it still satisfies the Cauchy-Schwarz inequality 
though, 

|(x, Ty) | 2 R (x, Tx)(y, Ty). 

Conversely, every bounded inner product ((, )} on//, in the sense that |((x, y))| R 
c||x||||y||, is of this type. Use Example 1 1 .2 1 ( lc) to deduce that, for all x e H, 
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II 7* II < \IW\\J {X, Tx). 

In particular, (x, Tx) = 0 <© Tx — 0. 

9. If /: R. — >■ R is increasing and a ^ T ^ b then f(a) ^ f(T) ^ fib). 

10. To calculate /(A) for a positive self-adjoint matrix A, first diagonalize it 
A = PDP~ X , then work out /(A) = Pf(D)P~ l . For example, 

11. There exists A“ ^ 0 for a > 0 when A ^ 0, for which ( A“) 1/,Qf = A. 

12. If — 1 ^ A ^ 1 then A + i Vl — A 2 is unitary. Hence any T e X is the linear 
combination of at most four unitary elements. (Hint: A = (U + U*)/2.) 

13. Solve the equation TAT = B for the unknown T ^ 0, given A, B f 0 invertible 
(Hint: AzTATAz = (AzTAz) 2 ). 

14. Consider <t> e X* which preserves inequalities, 0 ^ A =>■ 0 ^ ©A; it satisfies 
Proposition 15.42 except that |^T| 2 ^ fl(j)(T*T) ^ (</>l) 2 ||T|| 2 . Such positive 
functionals, as they are called, are positive multiples of states. 

15. If / : X -* y is an algebraic * -morphism, then 

X / ker J = im J O im J is closed. 


Polar Decomposition An important application of the use of square roots of positive 
self-adjoint elements is the following generalization of the polar decomposition of 
complex numbers to B(H ): 

Proposition 15.45 Polar Decomposition 

Every operator T e B ( 11 ) has a decomposition T = U\T\ , in which 
|T| := -s /T*T f 0 and U : im 1 7j im T is an isometry. 

Proof T* T is positive, so its square root R := ~JT*T f 0 can be defined. R reduces 
to the previous definition of | T | when T is normal, so it is common to write | T | for 
R. Then || |T|x|| = || T x\\ for all x e H, as 

(\T\x, \T\y) = (x, \T\ 2 y ) = (x, T*Ty) = (Tx, Ty). (15.4) 

Letf/:im|T| -©• im T be defined by U(\T\x) := T x\ it is well-defined by (15.4), 
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\T\(x - y) = 0 <£> T(x - y) = 0, 


and isometric, so can be extended isometrically to im | T | — > im T (Examples 8.9(4)). 
It can be extended further to the whole of the Hilbert space H by letting Ux — 0 
whenever x belongs to the orthogonal space ker 7’ | , in which case it is called a 
partial isometry. □ 

Furthermore, when T is normal and U is extended to a partial isometry, T = \T\U 
is also true: ker |T| = ker T by (15.4) and since ker T* = ker T (Proposition 15.12), 

hnjTT = (ker | T I)- 1 = (ker T )- L = (ker T*) 1 - = iin T. 

In fact, 

for.r e ker \T\, \T\Ux=0 = Tx, 

for x = \T\y e im|r|, \T\Ux = \T\U\T\y = \T\Ty = T\T\y = Tx, 


and by extension \ T\Ux = Tx for* e im |r| as well. 

On the other hand, if T is invertible, then it implies, in succession, that T* , T*T, 
and |T| are invertible; thus U is an onto isometry on H, hence unitary. 

Proposition 15.46 


Every unitary operator in B(H) is of the type e' T with T e B ( // ) self- 
adjoint. 

The group of invertible operators Q ( H ) c B(H ) is connected and gener- 
ated by the exponentials. 


Proof (i) The polar decomposition of any self-adjoint operator B e B{H ) is B = 
V\B\ where 


Vx: = 


x x e ker B- 
— x x e (ker B _)-*- 


im B- 


since B + x e ker B- ( B-B + = 0). Note that V 2 = I . Hence 


V\B\x = V B + x + VB-x = B + x — B-x = Bx. 


Let U be any unitary operator on H. It equals U = A + iB where A , B are 
commuting self-adjoint operators such that A 2 + B 2 = /.It follows that A com- 
mutes with B- (Example 15.37(3)) and thus preserves ker B- and im B- (Exercise 
8.10(19)). Accordingly, if B = V\B\ is the polar decomposition of B. as above, then 
V commutes with A: for all x = a + b e ker /?_ ® im 
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V Ax — V A(a + b) = Aa — Ab — A(a — b) = AVx. 


The function arccos : [—1,1] — »■ [0, 7r] is a continuous function, and —1 ^ 
A ^ 1, so we can define C := arccos A e B(H ), and this commutes with V. Let 
T := VC, so that T 2 = V 2 C 2 = C 2 . Hence, 

y2 j-2 

e iT = {I~ — + ...) + iT(l-— + 

C 2 C 3 

= (/-—-] ) + i V (C - — H ) 

= cos C + iV sin C 

= A + iV\B\ (sin o arccos(A) = J \ — A 2 = \B\) 

= U. 

(ii) Consider the polar decomposition of an invertible operator T = U\T\, where U 
is unitary and |r| is invertible. By the above, U = e lA , while \T\ has a logarithm, 
|r| = e lB (Exercise 14.26(1)). Hence T = e ,A e B lies in the connected component 
of I (Proposition 13.24), which must therefore equal Q(H). □ 


Spectral Theorem for Normal Operators 

There is one further extension of the functional calculus of the C*-algebra B(H ): 
when T is a normal operator, f(T) may be defined even for bounded measurable 
functions. 

Let 1 q be the characteristic function defined on a bounded open subset QcC. 
To find an operator that corresponds to 1 q, we will be needing the following lemma: 

Monotone Convergence Theorem for Self-Adjoint Operators: If A n f 0 is a 
decreasing sequence of commuting self-adjoint operators in B(H) then A n converges 
strongly to some operator A ^ 0. 

Proof It is easy to show that when 0 fS f T commute, 

S 2 tfS 2 + (T - S) 2 = T 2 - 2 S(T - S) f T 2 . 

From this it follows that A 2 is also a decreasing sequence, as is || A„x || by Example 6 
above. Also \\A n x — A m x || 2 ^ |||A m x|| 2 — ||A„x|| 2 | — > 0 as n,m — > oo, since 
An A m ^ A 2 for n f m, so (A„x) is a Cauchy sequence in H . Now apply the 
corollary of the uniform bounded theorem (Corollary 1 1.35). □ 

It follows easily from this that an increasing sequence of bounded self-adjoint 
operators A„ ^ c converges strongly to some operator A f c. 

There exist increasing sequences of positive continuous functions f n : C — ► R + 
which converge pointwise to 1 q ; for example, take /„ (z) := min( I , n d(z, C c )). 
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Using the continuous functional calculus defined in Theorem 15.36, f„(T ) exist as 
positive self-adjoint operators on H with norm equal to ||/„(7’)|| = ||/„|| c = 1. 

We can therefore define 1q(T)x := lim,,^^ f n (T)x for all x e //.For any closed 
subset F of C, there are nested open sets U n such that F = P|„ U n . So 1 f(T) can 
be defined by 1 f(J)x lim, ! _ j . 00 1 u n (T)x by the monotone convergence theorem 
above. Some properties of l^fr) are: 

1. lfi(r) is an orthogonal projection; so 1 q ( T ) / 0. 

Proof Write A n := f n (T) and A := l n (T). Then 


(Ay,x}= lim { A n y,x)= lim {y, A n x) = (y. Ax), 


II (A* - A 2 )x\\ = || (A„ + A)(A„ - A).r|| < (1 + ||A||)||(A, ? - A).r|| 0. 

Thus 1 q(T) 2 = I o ( T ) is self-adjoint, and hence othogonal (Example 15.14(1)). 

2. (a) If U,V are disjoint open sets, then lf/(T) + 1 y(T) = luuv(T), 

(b) lunv(T) = \u(T)\ v (T). 

Proof If /„ (z) \[j(z) and (z) 1 v (z) for z e Cthen f„(z)+g n (z) 

1 u(z) + 1 vb(z) = lf/uvfe)- So by the continuous functional calculus and 
the strong convergence of /„ and g n , it follows that f n (T )x + g n (T)x — >■ 
\uz>v(T)x for any x e H. 

Similarly, the second statement results from fn(z)g n (z ) l[/(z)l y(z) = 

lt/nv(z)- 

3. 1 0 (T) = 0, 1 c(T)(T) = I (since if cr(T) C U and f n -> If/, then fn\<r(T) = 1 
for n large enough). 

The projections 1e(T) for Borel sets E are defined by the same procedure and 
are said to be the spectral measure associated with T. We gloss over the details of 
the exact definition (see [10]). 

One can now follow the same steps of creating the space of step functions through 
to /^(C), but starting from the projections 1 e(T) as ‘step functions’. The end 
result is a functional calculus in which f(T) is defined for any complex-valued 
/ e L°°(o(T)): if / is approximated by a, 1 u i , then f(T) is approximately 
2/ a i 1 Ui (T). Indeed, /(/’) is still meaningful even if / e T 1 (a(T)) but need not 
be a “bounded” (i.e., continuous) operator. 

Proposition 15.47 von Neumann’s Spectral Theorem 

For any normal operator T and / e /.^ (rrf / )), there is a spectral measure 
E\ such that 



a(T) 
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Proof For any x, y e H, define ii x ,y(U) := (x, 1 u(T)y) for any open bounded 
subset U C C. By the properties proved above, /i x v can be extended to a measure 
with support equal to <j(T). (It is not a Lebesgue measure on C as it is not translation 
invariant, but Borel sets are // A , -measurable.) It has the additional properties: 

Px,y\ +J2 = P’Xi'yx Bx,y 2 ) Bx,\y — ^Bx,y-> By,x = Bx,yi 0 ^ Bx,x ll-f II - 

It follows that for any / e (( x,y )) := ,f a{T) f dji xy is a semi- 

inner-product which is bounded in the sense |((x, y))| ^ II /II z«> II* II II y II- Thus, by 
Exercise 15.44(8), ((x, y)) = (x, Sy) for some continuous operator S which we 
henceforth call f(T), 

(x, f (T)y) = f 

<r(T) 

f (T) agrees with the earlier definition for f e C(a(T)): Any such / is uniformly 
continuous, so for <5 small enough fB$(z) C B c (f(z)), independently of z e a(T). 
Let Bj be squares, with centers A; and diameter less than S, which partition <i(T)\ one 
can find slightly smaller closed squares A, C B, and slightly larger open squares 
Cj D If, such that X, Bx,y(Ci \ A,-) < e. Moreover, one can find continuous 
functions hj such that 1^ ^ /?, ^ 1 q and X/ hi — L for example, let hj(s, t) := 
h(s)h(t) where h(t) = min(l, rdf, I c )) is a continuous real function with support 
equal to I and taking the value 1 just inside it. Then (writing // = // t v ) 

(x, f(T)y) = Y <*. fhi(T)y) « /(A,)(x, hfT)y) « Y 
i i 

More rigorously, (it is enough to consider real-valued functions) 

(x, fhi(T)y) < (/( Xi) + e)B(Ci) 

= (/(A/) + e)B(Bi) + (/(A,) + e)Oi(Ci) - /t(B,)) 

-<x, fhi(T)y ) < -/(A,)/x(B ; ) + + (/(A;) - e)(/t(S,) - MA,)) 

.-. |(x, fhi(T)y) - f (Xi)fi(Bj)\ < e/xCB;) + |/(A ; ) + e|(/i(Cj) - /x(A,)) 

.-. |(x, /(7» - X /(A,)m(5,)| = | X fhi(T)y) - Y / (A/ )/x( AB, ) | 

i i i 

<Xl<x-/*<( 7 ’)y>-/(Ai)M(fii)l 

i 

< X(|/(A/)I + e)(^(Ci) - M(A,)) + e/r(B,) 

i 

^ (II / II C + £ ) £ + e 

Hence, in the limit e -> 0, (x, f(T)y) = J , T) f d.Bx,y- 
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The map / i->- f(T) is a *-morphism from L°°(o(T)) to 5(7/): Linearity is 
immediate, (/ + A g)(T) = J a(T) (f + A g) dp x , y = f(T ) + A g(T). 
f(T) = f (T)* since 


<*. f(T)y) = J f dp x ,y = Jf dp x ,y = (y, f (T)x) = (f(T)x, y) = (x, f(T)*y). 

fg{T) = f (T)g(T) follows from 

[ d p x ,f(T)y = (x, f(T)y) = f fdp Xi y 
Ja(T) J<y(T) 

=>• ( x,fg{T)y)= J fgdp x , y = J f dp x ^ T ) y = {x, f(T)g(T)y). □ 

In particular, T = f a ( Tj A This result, and the next one, are often claimed to 
be the pinnacle of the subject of functional analysis. 


Embedding in B(H) 

Theorem 15.48 Gelfand-Naimark 

Every C*-algebra is embedded in 5(H), for some Hilbert space H . 


Proof We have already seen that every Banach algebra X is embedded in B ( X ) 
(Theorem 13.8); as in the proof of that theorem, we will again denote elements of 
X by lower-case letters. The main difficulty is that there is no natural inner product 
defined on X or B (X). Rather there are many semi-inner-products, one for each 

feS, {x, y)^ := <j>(x*y). 

Let Xi^ := { x : <p(x*x) = 0 }; it is a closed left-ideal, since for any a e X and 
x e M, then ax e X4^ 


0 < f(x*a*ax) < </>(.c*x)||a|| 2 = 0. 

This allows us to turn X / Xi Q into an inner product space, which can be completed 
to a Hilbert space H 0 (Examples 10.7(2)) and 13.5(6). The inner product on X / Xi 0 
is given by 

(x + M 0, y + Mcf) := f(x*y). 

The *-morphism L : X — » B( H 0 ): For any a e X, consider the linear map 
defined by L a (x + Xif) := ax + Xi,j> on X/Xi(j,\ this is well-defined since aXi,/, C 
Xi(f,. It is continuous with ||L fl || ^ ||a|| since, 
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John von Neumann (1903-1957) Originally from Budapest, he 
studied in Berlin, under Weyl and Polya, but graduated at 23 
years under Fejer in Budapest with a thesis on ordinal num- 
bers. A young party-going genius, in 1926-30 he defined Hilbert 
spaces axiomatically as foundation for the brand new quantum 
mechanics and generalized the spectral theorem to unbounded 
self-adjoint operators. In the 1930s he went to the Princeton 
Institute, proved the ergodic theorem, and studied rings of op- 
erators and group representations; only turbulent fluid dynam- 
ics proved too hard (it remains unsolved today); in 1944 he 
started game theory, proving the mini-max theorem, then on to 
computers and automata theory. 


Fig. 15.1 von Neumann 


|| L a (x + M^W = ||ax + M^W =y/4>(x*a*ax) ^ J fi(x* x)\\a\\ = ||a||||x + M^W- 

This map extends uniquely to one in BiH^) (Example 8.9(4)). 

Clearly L a is linear in a, L a b — L a Lf,, and L \ = /, but it also preserves the 
involution L a * — L*, 

{x + L a (y + = <t>(x*ay) = fi((a*x)*y) = { L a *x + y + M#}. 

It remains a *-morphism when extended to B( H,,,), by continuity of the adjoint. 

The final Hilbert space '. However L need not be 1-1. To remedy this deficiency, 
let H := n^eS be the Hilbert space of “sequences” x (x,fi,, )(i s such that 
e Hfj, and X</>sS i x <l>’ x <t>) h < °°’ ’ t has the i nner product 


(x,y) :=XW-% 

(/>eS 

It is straightforward to show that H is indeed a Hilbert space, by analogy with l 2 . 

Let J a x := {LaXrfi^s, so that J a ; H —> H is obviously linear, and also 
continuous since 

\\J a x\\ 2 = ^||L a x ^|| 2 < ||fl|| 2 ^ ||X 0|| 2 = ||a|| 2 ||^|| 2 . 

0 <t> 

The mapping a i->- J a , X Bill) is an algebraic *-morphism, 

{y, J a x) = ^{y<t>, L aX(j) ) = ^{L* a y ( j ) ,x ( j > ) = (J a *y,x). 

4 > 0 


Moreover it is 1-1, for if J a = 0 then L a x 0 = 0 for any x 0 and e S, in particular 
a + — L a 1 = 0. But this means that for all e S, a e i.e,, fi{a*a) = 0, 
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and this can only hold when cr(a*a ) C S(a*a) = 0, so ||a|| 2 = ||a*a|| = 0 and 
a = 0. 

Since every such *-morphism between C*-algebras is isometric, the theorem is 
proved. □ 


Exercises 15.49 


1. Examples of polar decompositions are 


( 0 - 2 ) (o-l) (02) 


/l lW 0.89 0.45 \ / 0.89 0.45 \ 
\0l)~ \ —0.45 0.89 / y 0.45 1.34 ) 


, and 


2. If T is a compact operator in B(H) with singular values X n and singular vectors 
e„, e' n , then \T\e„ = \ n e n and U \ e n v-> e' n . 

3. The polar decomposition of the right-shift operator in £ 2 is trivial: \R\ — I. 
What is it for the left-shift operator? 


4. T* = \T\U*, \T\ = U*T = T*U, and \T*\ = UT* = TU*, since U*U is a 
projection onto im |7’| and UU* is a projection onto im7\|||7 T ||| = ||T||. 

5. (a) T is normal <£> |T*| = |r|, 


(b) T is positive self-adjoint O T = \T\, 

(c) T is unitary <£> \T\ = I AND T is invertible. 

6. If |Sj = |T| and T is invertible then ST~ [ is unitary. 

7. When T is compact normal, with polar decomposition T = \T\U = U\T\, then 
U and 7’ are simultaneously diagonalizable, U — P~ [ e ,& P , \T\ = P~ l DP, 
so that T = P~ l De 10 P. 


8. Adapt the proof of the Polar Decomposition theorem to show that if T* T ^ .S'* .S' 
then the map U : imS —> im T , Sx Tx, is a well-defined operator with 
||t/|| < 1 and T = US. 

9. Every ideal in B(H) is a *-ideal since 


Tel ^ \T\ = U*T el =i T* = \T\U* e 1. 


10. Every invertible element T of a C*-algebra can he written uniquely as T = U\T\ 
where U is unitary. 

11. Trace-class Operators: Let Tr := { T e B(H) : tr |7’ < 00 } with norm 

||7’|| Tr := tr \T\ (Proposition 15.30 and Examples 15.33(6)). 

(a) ||71 Tr = |||7’|i ||^ 5 , and T e Tr & \T\t- eHS, 

(b) tr)? 1 ) is independent of the orthonormal basis, 

(c) | tr(5T)| < ||5|| ||7’|| Tr ; in particular ||7’|| WiS ^ ||7’|| Tr , 
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(d) Tr is a closed *-ideal in B{H), 

(e) T e Tr T — AB where A, B e HS, 

(f) tr | T\ = |A„ |, where (A„) are the singular values of T (repeated accord- 

ing to their multiplicities), tr T — ]T (; holds when T is normal and A„ 
are its eigenvalues. 

12. GNS construction : When X is represented in B(H), every state d e S(X) is 
associated with a unit vector x e H, such that dy = (x. J y x). 

Proof The vector in question is x := {x^)^s where x f p = 1 + and = 0 

otherwise. For every y e X, 


<t > 00 — (1 + y + Mf ) — 

Remarks 15.50 


LyX(p) h — 


S W” LyX,)}) H%j) 
ll> 


(x, JyX) H . 


1. The Banach algebra axiom ||1|| = 1 is redundant for C*-algebras as it follows 
from ||7’*r|| = || T || 1 2 (assuming A” ^ 0). 

2. The use of A < B is best avoided: it may either mean A f B but A f IS or that 
cr(Z? — A) C ]0, oo[. 


Hints to Selected Problems 


2.2 (1) Writing a := x — z, b := z — y, and substituting into \a + b\ ^ \a\ + \b\ 
gives the triangle inequality. 

2.3 (2) (a) 3 a e A, 3b e B, d(a , b) < 2, (b) Ve > 0, 3 a e A, 3b e B, d(a, b) 

< e. 

2.14 (5) The two sets have, respectively, the shapes of a diamond, and a square with 
a smaller concentric square removed. 

(9) For example, R \ a. 

2.20 (2) The complement of the set is { x e Q : x 2 > 2 } since V 2 is irrational. To 
prove the set is open, one needs to find a small enough e such that 

2 < ( x — e) 2 = x 2 — 2ex + e 2 . 

(6) Try the graph of the exponential function and the x-axis in R 2 . 

(7) The Cantor set is the intersection of all of these closed intervals. 

(8) First show the set { x e [0, 1] : ^ 5 } for fixed k is closed. 

(10) The answer to the first question is of course no: all points on a circle are equally 
close to the center; the second is also false e.g. in Z; it is true however in R 2 because 
the line joining an interior point to x contains closer points. What properties does 
the metric space need to have for this statement to be true? 

(13) No. Take the subsets A := [— 1, 1] and B := R \ { 0 } in 1R. 

2.22 (2) Any ball B r (x) will contain a point a of the dense open set A. There will 
therefore be a small ball /R (a) C A IT B r (x) which contains a point b e B. 

(3) The complement of the Cantor set is open and dense. 

(5) dU = U \ U contains no balls. 

3.5 (1c) n/a” = n/{ 1 + 5)" < ^ -> 0. 

(It) a n := (1+i)" =2+ i(l-i) + 1(1- i)(l -=) + ••• + ^.Soa^! > a n , 
yet a,j< 2+2 + ^ + -- -<2+^ + ^ + -- - = 3. 

(If) a n — > oo means Ve > 0, 3 N, n ^ N => a n > e. 

(2) The limits must satisfy x — 2 + \fx and x — 1 + \/x respectively. 

(3) Eventually, \a„\ < a < 1, so \a n \ n < a" . 


J. Muscat, Functional Analysis, DOI: 10.1007/978-3-319-06728-5, 
© Springer International Publishing Switzerland 2014 
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Hints to Selected Problems 


3.12 (3) See Proposition 7.8 

(4) If x„ — > x then i/O and \x n \ ^ c > 0, so that |l/x„ — l/x| = \x — x n \/\x n \ 
\x | -> 0. 

(10a) The map /(x ) = (cosx, sin x) is a continuous bijective map from [0, 27 t[ to 
the circle. The inverse map is discontinuous at (—1, 0). 

(10b) Take / to be a constant function, and x n = n. 

(10c) Take f(x) x 2 and U := ] — 1, 1[. Examples of open mappings on M are 
polynomials which have no local maxima/minima. 

(11) (/ -1 F) c = f~ l F c is open. The identity map [0, 1[ —y [0, 2] is a continuous 
open mapping whose image is not closed. 

(17) d(x, A)/(d(x , A) + d(x, B)). 

(18) All non-empty open intervals of the type ]a, b\ are homeomorphic to, say, ]0, 1 [ 
by stretching and translating. ]0, 1 [ is homeomorphic to ]0, oo[ viax 1/x — 1, and 
this in turn, is homeomorphic to M via xi->x + Vx 2 + 1 (for example). Similarly, 
]«, oo[ and ]— oo, /;[ are homeomorphic to them as well. 

(19) Points { x } are open in N but not in Q. 

4.10 (1) The difference between the nth and /nth terms of decimal approximations 
is at most l0- min < m ' n ). 

(4) The finite number of values have a minimum distance e between them. 

(5) Taking m ^ n. 


d (x n , Xm) ^ d (x„ , X /2 — 1 ) + • • • + d (Xf f/ i | . x m ) 
^a(c n ~ l +--- + c m ) 
ac m 

< y t) as m, n —y oo. 

1 -c 


Note that 1 /n —y oo. 


(6) | d(x n , y„) - d(x m , y m )\ < \d(x„, y n ) - d(y„,x m )\ + \d(x m ,y n ) + d(x m , y m )\ 

^ d (x n . x„, ) T d (y „ , y m ) 


(7) For example, the continuous function /(x) := 1/x, defined on ]0, 1] — > [1, oof, 
maps the Cauchy sequence (1 /n) to the unbounded sequence (n). 

(9) + T -Vn = V«((l + 1 /n) l/1 - 1) = jjn + • ' ' 

(11) If { x„ } are the values of a Cauchy sequence, and x is a boundary point, then 
there is a subsequence x m — > x (by Proposition 3.4). 

(14) Any Cauchy sequence in a discrete metric space must eventually be constant. 

(15) The intersection of the balls can contain at most one point, since r n —y 0. In 
fact, if x„ — y x, then x e B, „ [x« ] for all n, since the balls are nested. 

(16) First show that f(n) = /(I + • • • + 1) = nf( 1), then f{m/n) = f f(l). 

4.17(lb) |(x 2 - x 1 )y 2 + x 1 (y 2 - yi)\ < (\y 2 \ + |xi +x 2 |)|xi - x 2 | < 3|xi - x 2 |, 
l(xi +x 2 )(xi - x 2 ) + (y 2 - yi)\ < 2|xi - x 2 | + \yi - y 2 1. 
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(5) Let /: X — > Y be an equivalence; then every Cauchy sequence (x n ) in X 
corresponds to a Cauchy sequence in Y, by uniform continuity and Proposition 4.12. 
Since equivalences are homeomorphisms, (x n ) converges precisely when (fix,,)) 
does. So X is complete •<=>• Y is complete. 

4.21 (2) Repeat the proof of Proposition 2.10, using B, n (a„) instead of B,-( x )(x), 
where a n is an approximation of x. 

(4) Let X be an uncountable set with the discrete metric. Then B 1 / 2 (x), for each 
x e X, form an uncountable collection of disjoint sets. 

5.7 (1) Take X \ { x\ } and X \ { xi 1 as the open sets; alternatively take small enough 
balls. For (b), take X \ F\ and X \ If. 

(2) To show that every subset of Q is disconnected, use the same idea with some 
other irrational. 

(5) Consider the open sets f~ l { 0 } and f~ l { 1 }. 

(11) Suppose f(a) < f{y)\ f(x) > f(y) is impossible else there is some z e [a, x] 
such that f(z) = f(y). 

5.12 (2) The metric space is the union of the path images, whose intersection contains 
the fixed point. 

(5) Use Theorem 5.9 with A y := X x { y } and fi := {ro} x F. 

(6) Without loss of generality, take x — 0 ; then R 2 \ { x } is connected using the unit 
circle and radial lines te for t > — 1 and unit vectors e. 

(8) Otherwise, the interior and exterior of the set would disconnect a component. 
(10a) If a component C has a boundary point a f C, then C U B € (a ) would be a 
strictly larger connected set. 

6.4 (3) If B is bounded, so B C B r (x), then B C B r [x\. 

6.9 (3) From some N onwards, x„ e B e (x.y); cover the rest of the values x m with 
Re (Xm ) . 

(4) Let B C U,=i R./2(*»), then B C \jf =l B e/2 [xi\ c IjjLi B e (xd (Theo- 
rem2.19). 

6.22 (6) Suppose d(K, F ) = 0, then there are asymptotic sequences a n e K,b„ e F; 
( a n ) has a convergent subsequence, and therefore (b„) converges to the same limit. 
But then K fl F 0. 

(7) After showing K C B, (r. 0), use the fact that there is a point a e K which has 
maximum distance from (r, 0) less than r. 

(13) The unit sphere is a closed subset of the cube [—1, 1]^. 

(16) X x Y is complete and totally bounded by Proposition 4.7 and Exercise 6.9(1). 
6.27 (1) If /„ — »■ / with f n e C(X, M), then f„(x) — > f(x ) in C, and taking the 
imaginary parts shows that f(x ) e M. 

(4) f(y) - f n {y) < f(y) - f N (y ) < I f(y) - /Ml + I fix) - f N (x)\ + \ f N (x) - 
fs(y)\ < e where N depends on x, and x — y\ < d, small enough but independent 
of x (Proposition 6. 17). So / — e ^ /„ ^ / on Bs(x) for n ^ N .By compactness, 
one N will suffice. 

(5) Convert any binary sequence (of 0s and Is) into a “tent” function in C(R + ); there 
are uncountably many such functions and their distance from each other is at least 1 . 

(8) (x + \x\)/2 « x(x + l)/2. 
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7.7 (2) Balls look like circles, squares and diamonds in the 2-norm, oo-norm, and 
1 -norm respectively. 

(4) Let A := { \a n \ }, B := { \b n \ }. Then from Note 15 of Sect. 7.1, sup \Xa n \ = 

sup |k|A = |A.| sup A, sup \a n +b n \ ^ sup(A+B) ^ sup A + sup B, and if sup A = 0, 

then 0 ^ a„ ^ 0 implying a„ — 0 for all n. 

(7) The functions /„ := l[o,t/«] converge to 0 in L'[0, 1] but not in L°°[0, 1]. The 
inequality || jc H^oo ^ || jc ||^ i remains true for sequences, so convergence in i x implies 
that in i°° . 

( 8 ) For r > |||x|||, x e rC, so Xx e XrC = \X\rC , i.e., |||Ajc||| ^ |L||||*|||; but 

then |||*||] < |-|||A.x|||. If 5 > |||y|||, then x + y e rC + sC = (r + s)C, hence 

III* + 2 / III < 111*1 + \\\yj\- 

7.14 (3) Let x, y e C; then there are points a, b e C within e of x and y. So any 
point on the line tx + (1 — t)y is also close to a point on the line ta + (1 — t)b which 
lies in C because 

\\tx + (1 — t)y — ta — ( 1 — t)b\\ ^ f ||* — a|| + (1 — Oily — b\\ < e. 

(4) A convex set C is the union of line segments that start from a fixed point xy € C, 
then use Theorem 5.9. 

(5) If Xa n — » x, a, , e A, then a n x/X (for X ^ 0) and x/X e A. Conversely, if 
x e XA, i.e., * = Xa with a n — > a, then Xa n — > Xa = * and x e XA. 

Similarly, when a n -» a, a n e A, and b n -» b, b n e B, then a n + b n — * a + b, 
so a + b e A + B. An example in R. is A := [n + l/n : n = 2, 3, ... } and 
B := { —n : n — 1,2,...}. 

7.20 (lb) XSv *i = *j ~ X/Lo 1 *i 0 as N -* oo, since convergent 

sequences are Cauchy. 

(3) The odd sub-sums a i — (ai — 03 ) — (04 — ( 75 ) + ■ ■ • are decreasing, and bounded 
below by the increasing even sub-sums (ai — a 2 ) + (aj — 04 ) + ■ ■ • . 

7.22 (5) Applying the Cauchy test to the series 2 "/( 2 " p ) converges only 

when p- 1 < 0 ; for p = 1 , \ diverges; becomes £„ 2 <"„ which 

diverges; etc. 

(11) For N large enough ||*i + • • • + xn — *|| < e as well as X^Liv+l ll*nll < e - 
So for k large enough that include I ..... A', 

ll*ni H h *n* -*ll < 11*1 H b XN — *11 + y, ||*extra II 

8.10 (3) im R is closed since for Rx n — > y, the first components give 0 -* yo, so 
y = (0, yi, ...)= R(y\, . . .). 

(4) Proof that im T is not closed: Let v n := (1, 1/2, . . . , l/n, 0, 0, 0, . . .), then 
Tv n = (1, 1/4, ... , 1 / « 2 , 0, . . .) converges to (1, 1/4, . . .) e t 1 as n — > 00 since 

00 1 

11 ( 0 ,..., 0 , l/(n+l) 2 ,...)||,i = V -->0 

z ' n A 

n=N+\ 
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Yet, there is no sequence in l l which maps to this sequence since (1, 1/2, 1/3, . . .) 

(5) e„/n — > 0 in £ l because || (0, . . . , 0, 1 /n, 0, . . .)||^i = 1 /n —*■ 0, but e n 0 
since ||(0, . . . , 0, 1,0,.. .) ||^i = 1. 

(6) T = L A -I. 

(9) If x„ — >■ x then x n — x — >■ 0 and T x n — Tx = T ( x n — x) — > T 0 = 0. 

(1 1) (a) If T e,- are linearly independent, then e, are linearly independent. So if 5T e; 
are linearly independent, so are T e, and dim(im(ST)) ^ dim(im(7’)); the other 
statements follow from im(S7’) C im(.S') and i m ( S + T) C im(.S') + im(7’). 

(b) If e \ , . . . , ek form a basis for ker 7’, extended by eu+\ e n to a basis for X, 

then Tc j i, i = k + 1 n, form a basis for im T . 

(c) Let Tei, i = 1, . . . , k, be a basis for ker S fl im T . Extend e, with a basis e'j for 

ker T . Then ST x = 0 implies Tx = X/= l T e,- , hence x = X/= l a ‘ e < + Z j Pj e 'j ■ 

(15) (3) ||L|| = 1 = || 7? ||; (6), using £°°, ||5|| = 1, ||7’|| = 2; (8.4(8)) when Tx = ax 
on l\ ||71 = ||a||^oo; (8.6(1)) || f A || = 1; (8.6(4)) ||^|| = 1; (8.6(6)) use ‘spike’ 
functions that are zero except near to 0; (12) ||</>|| = 1; (14) ||7’|| = 1, || 7^ || = 1, 
II M g \\ = \\g\\ c -,. 

(16) Proof for first matrix. Assuming, without loss of generality, that |/x,| ^ /, | , 

I (o °) (v) ||2 = 1 (w) I 2 = W 2 '*' 2 + « |i|2<1 *' 2 + 1!/|2) 

so || T’jc || ^ |A.|||jc||. However for x = Tx = Xx, so |L| ^ ||7’|| ^ |A.|. 

(18) Choose unitx,, such that ||7’,,x„|| ^ ||7'„|| — 1/2". 

8.14 (6) For x = ( a,- ), take the supremum over i of 


\Tgai + y' J T ij aj\ > (|7»||fl,-| - ^ |7) 7 |||x||) 
j ¥=i j¥=‘ 

> c||x|| - (sup |7}i|)(||jc|| - |a,-|) « c||x||. 


(8) If Jx ■ X i — > X 2 and Jy : Y 1 — > Y 2 are the isomorphisms, then J (T) := 
JyT J^ 1 gives the required isomorphism; note that J ~ 1 (.S') = Jy l SJx- 
8.21 (3b) Show y i-> (0, y) + X x 0 is an isometry. 

(5) Let { a n } be dense in M and { b n + M } dense in X/ M. Then { a n , b m } is dense 
in X. 

8.25 (5) See the Hilbert cube Exercise 9. 10(3). 

(6) Every point x e [ei , . . . , is a boundary point (consider x + ee^+x)- 
9.4 (2) The functionals on c are y T (y e £ l ) and Lim. 

(6) coo C l £°, so l = co; 1/ log n does not belong to any ££°. 

9.7 (1) Let y n := x n - x e l x \ then Z£jv+i \Vm\ < Z/Zv+i 1 2/1/ 1 < e for some 
N and all n. But \y n \\ H b \y n N\ -» 0 as n -» 00 , so jL \y ni \ < 2e. 
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9.10 (4) It is enough to show ||x — a \\ t \ < e for a = ( ao , ■ ■ • , on, 0, . . .) e coo, N 
large enough. 

9.15 (2) Try \a n \P I p' e~ Wn . 

9.27 (2) Look at the dual spaces of L 1 [0, 1] and cq to see why they are not isomorphic. 


(6) Write + 2 nix^ = ^-(x + icr 2 %) 2 
10.10 (2) In Pythagoras’ theorem, \\y + z\\ 
Consider 

II ^ ' Xn\\ = I ^ ' i^n •> Xfn ) I ^ ^ ' I (•*•« ? 

n n,m n,m 

This can only be an equality when | (x n , x m ) 
(5) Writing x = X„ a nV„ and y = X,„ b„,\ 


7 tg 2 ^ 2 to simplify the integral. 
= ||y|| 2 exactly when z = 0. 



n,m n 


| = \\x n || \\x m || for each n,m. 

1 m forabasis iq,. we find 


{x, y) = X 


U-n bm {Vr ■ V/r ) 


nm 


(10) (1, 1, 0, . . .) and (1, — 1, 0, . . .) do not satisfy the parallelogram law; write these 
as step functions for L 1 and L°°. 

(12) sin(x) cos(x) dx = | sin(2x)dx = [—008 2x1!^ = 0, and f Q l 2x 3 — 
x dx = j[x 4 — x 2 ]g = 0. 


(15) Substitute X — a + if J >, then find the minimum by differentiating in a, ft to get 

X = -{x, y). 

(16) \\x„ x m || ^ ||x„ T !jn x m y m || > 0 since (x n x !n , y n y m ) = 0. 

(17) The 'inner product’ remains continuous, so Z is closed. 

10.15 (1) Answer -^(22x 0 + 2 y 0 - 6zo , xo + 19yo - 3zo, -6x 0 - 3yo + 27zo )• 
(2a) Px e M so Px = Xy, and x — Px e M 2 -, so (y, x — Xy) = 0. Expanding gives 
k = (y,x). 

(3) Consider x e M 2- , and x = a + b where a e M, b e N\ since N C M 2 - it 
follows that a = 0. 

(5) Any vector x e N can be written x = a + b where a e M , b e M 2 -. Since 
M C N, then b = x — a e N as well. 

(6) Let x = a + b, a e M, be M 2 -; then Tx = Ta + Tb, Ta = Aa e M, 
Tb= Bb e M L . 


T|| 2 = sup 


Ta\\ 2 + \\Tb\\ 2 

IMI 2 + I |£|| 2 


by Pythagoras’ theorem. But ||Ta|| < ||A||||a|| and || 7’Z?|| < ||B||||fo||, so ||r|| 2 < 
f||A|| 2 + (1 — OIIBII 2 , where t = ||a|| 2 /(||a|| 2 + ||&|| 2 ). Now take t — 0 or t = 1 
depending on which is the maximum of the two. 

(8b) Expand d 2 < ||x — y|| 2 = 2 — 2Re (x, y) with y = e' e v. 

(9c) If 1 1 x — a\\ = d = || x — b\\ is the shortest distance from x to M, then 
|| fx + (1 — t)x — ta — (l — t)b || = d. 

(9d) The closest sequence would be 1 ^ cq. 
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(10) ||i 5 iy„ + i|| < ||y«+i|| < \\y n II- so ||y„|| converges. But in general, as Py _L 
(: y - Py ). \\y\\ 2 = \\Py\\ 2 + \\y - Py II 2 , so \\y n - P x y n \\ 0, and similarly 
P] y n — Pi P\ y n — >■ 0. In finite dimensions, the bounded sequence y n has a convergent 
subsequence, y ni — »■ y, so y = P\ y = P 2 P\y, and y is in im P\ fl im P%. 

(11) sinx » 0.955 - 0.304x Re -0.20 + 1.91x - 0.88x 2 + 0.093x 3 ; 1 - x 3 ^ 
1.13 cosx — 0.43 sinx. 

(12c) Answer: a = f p = 

10.18 (2) Check that ||x* || satishes the parallelogram law, then use the polarization 
identity, noting that (;x)* = — ix* . 

(3) </> corresponds to Px. 

(4) The map x i— >■ ((x, )) is a functional so corresponds to some vector Tx. 

io.26 (2) ||r|| 2 = ||r*r|| < ||r*||||r||, so urn < \\ t *\\ < ||r**|| = \\ t \\. 

(3) For x = (a n ), y = (b„), z = (c„), 

( z,yx ) = y'c n b„a n = y b n c n a n = ( yz,x ) 

n n 

(5) fd g(x)Vf(x) dx = fd [J g(x)f(t) dr d y = g(x)f(t ) d y dr. 

(8) T*Tx = 0 => 0 = (x, T*Tx) = ( Tx , Tx). 

(9) Fix a unit vector u e X, X := ( Tu , Tu) > 0, and let v be any orthogo- 
nal unit vector; then (Tu, Tv) = ( u , v) = 0; similarly, ( T(u + v), T(u — v)) = 
(u + v, u — v) = 0, so (Tv, T v) = X > 0 constant. For vectors x = au, 
y — /Si u + p 2 v, (Tx, Ty) = af x X — X(x, y). 

(11) Answers: (-5/2, -2/3, 7/6), (-17, -5, 7)/3. 

(15) T ' T is the projection onto ker 7’ L ; TT ' is the projection onto im T. 

(17) V*Vf = V*g is fd f* f(t) dr dx = fd g(x) dx. 

(18) Answer: r = 0.497m and K/m — 0.0062m~' (the actual values used to generate 
the data were r = 0.5m and K/m = 0.003m _1 ). 

10.35 (1) Take the inner product of ]T n ot n e n — 0 with e m . 

(4) (( e n , 0), (0, e m )) = (e n , 0) + (0, e m ) = 0; if x and y can be approximated by 
xn ■= X«=i a ne n and y M '■= Zm=i A»e m respectively, then 

||(x, y) - (x N , y M ) II = II U - x N , y ~ yM ) II = V||x - x N \\ 2 + \\y - y M II 2 

can be made small; note that (xjv, Vm) = (xv, 0) + (0, yM ) = Xh=i a n(e n , 0) + 
Cm)- 

(5) (x — x*, e„) = 0 

(6) Suppose e n and Ue n are both orthonormal bases. Then, by Parseval’s identity, 

(Ux, Uy) = y a n f} m (Ue n , Ue m ) = (x,y). 

n,m 

U is onto because y = a n Ue n = U a„e n ). 
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Conversely, if {e n } is an orthonormal basis for H\, and y e {Ue,,}- 1 , then 0 = 
(y, Ue n ) = ( U*y , e n ) for all n, so U*y &{e n } ± =0 and ||y|| = \\U*y\\ = 0. 

The column vectors of the matrix of U are Ue n , so ( Ue n , Ue m ) = ( e n , e m ) — S llm . 


(8) For example, take X 



. For the second part, substitute e m instead of 


x, and deduce orthogonality; if x e { e n then ||x|| = 0. 

(9) Show x — j ^ e 2n,nx , then take x = 1 /4. It is interesting to generate 

other series using other points and functions (e.g. \x\, x/\x\, | sinx|). 

(10) For / odd about 1/2, = — a„. In general, every / is the sum of an even 

and an odd function. 

11.7 (4b) Continuity of T(Sx ) := Tx: For any v e ker .S' and y e Y, \\Ty\\ = 
|| || = || T(x + u) || ^ c|| T|| ||x + u||, then use ||x + ker S|| ^ c||5x||. 

(5) |a„| = \\a„e n \\ < || X”=i a i e i ~ X"=i a i e i II < 2c|||x|||. 

11.15 (6) If x n e B r ( 0) then Tx n e T 5/ (0), so has a Cauchy subsequence, which 


converges. 

11.26 (4) The requirement is <p(x, y) — x + Xy, \x + Xy\ ^ |x| + \y\, so |k| ^ 1. 

(8) -*- 11 ) — 0, so (■ L< J>) _L = i 1 * . Now in the correspondence of t ] * with l °° , we get 
HO]] = coo and so [[OI = c 0 . 

(9) |0x| = \(p(x + a)\ ^ || 0|| || x + a|| for any a e M \ in fact this approaches equality 
forcertaina e M, so||0|| = ||0||. Onto: for any 0 e (X/M)*, let0x := i /s(x + M). 
Hint for the second part: the norm of ||0 + M 1 - 1| = inf^ eM ± 110 + 011 is the same 
as ||0 |mI|. 

11.32 (5) (r TT x**)0 = x**(T T (p) = (T J (f>)x = 4>Tx. 

(7) If T 1 is onto, then T is 1-1 by (1) and has a closed image; if T J is also 1-1, then 
im T is dense, hence T is onto. If T is onto, use the open mapping theorem. 

11.42 (1 ) For co, a functional is of the type y T where y — (b n ) e l 1 . Now y ■ e n = 
X; t>iS n , = b n -* 0 as n -> oo since i x C co- 

(2) Use the functional cy • x n — a n i ■ The converse is true for £ p . I < p < oo, 
whose dual space has the Schauder basis <?/ ; any 0 e £ p * can be approximated by 
XLobieJ.So 


N N N 

<px n ~ 'y'.biCi ■ Xn = y, bjUni ->■ '^b i a i S3 0X 
i =0 i =0 i = 1 

For l 1 , a functional is of the same type but y e £°°. This time y ■ e„ = b n need not 
converge to 0, e.g. y := 1. 

(4) If T n — T and T n S, then cpT x — cpSx for all 0 and x, so T — S. 

(lib) 10(7/5, , - TS)x\ < || 0|| || T n || || (S n - 5)x|| + |0(7/5 - T5)x| -> 0. 

(15) If x ^ M, there is a 0 e X* such that 0x = 1, 0M = 0, so x is not a weak 
limit point of M. More generally, every closed convex set is weakly closed, because 
a hyperplane (so a functional) separates it from any point not in it. 
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12.12(4) \\o(h) II = \\f(x + h) - f{x ) - f'(x)h\\ = | f d tf(x + th) - f\x)hdt\\ 

Jo 

[ \\f'(x + th) - f'{x)\\\\h\\ dt 
Jo 

< \k\\h\\ 2 . 


12.21 (5) Poles and residues are (a) 1/2 ie, and —i : ie/2\ (b) 1 : (e, e 1 )/3, and 
(o:(e m , e _ ")/3(w 2 , and co 2 : ( e 0)2 , e -" 2 )/3tt>; (c) 0 : 1. 

13.3 (11) If T R is invertible, then P(ST ) — l = (T R)Q, so T is invertible. 

13.10 (2) Each vector (a, b ) corresponds to the matrix ' “ ^ 


. (a 0 \ 
nx \b a + b)' 


(4) 1, A, . . . , A n cannot be linearly independent, so A m — p(A) must be true for 
some polynomial p. 

(10) This is a generalization of the convolution operation on £ 1 . The proofs are very 
similar to that case Exercise 9.7(2). 

(13) For any /, Txcpx = x<pTx, i.e., Tx = k x x. So if x, y are linearly dependent 
then Ty = k y y, implying Tx = k y x and k y — k x ; if not, then k y — k x+y — k x . 
(14d) If S, 7’ e A", then TR = RT for any R e A' 3 A", including R = S. 

(16) To show Ia Q 1, let f e Ta and let K be a closed subset of [0, 1] \ A\ 
then for any x e K, one can find a function g x e T such that g x (x) > 1 in a 
neighborhood of x. By compactness of K , a finite number of such functions “cover” 

1 g(x) > 1 

g(x) g(*) < l’ 

a continuous function with !i\k — 1 and belonging to X (h — gk). By making K 
larger, one can find a sequence of functions such that h n g — > g, so g el 

(17) To show ||/ +1a\\ = ll/UII, it is required to find functions g n e Ta such 
that ||/ — g n II — > ll/UII- This can be done as follows: take B := [0, 1] \ U, where 
U = A + B € (0), and let h be a function such that h\A = 0, h\B — 1; so fh e 1 a 
yet / - fh = Oon B, and || / — fh\\ ||/|A|| as e 0^ 

(19) Multiplication is well-defined, for if S — S e T, T — T el, then ST — ST = 
(S — S)T + S(T — T) el. Associativity and distributivity follow from those of 
X. Suppose || S + A , j || — > || S + 1|| , || T + B n || — >■ || T + X|| , for some A n , B n e 1, 
then 


K, so g := gxi+- ■ ■ + g Xn i s greater than 1 on K. Let h(x) 


||6T +T|| < ||(5 + A n )(r + B„)|| < ||S + A„||||r + 5„|| -> ||S + J||||T + 2|| 

Finally, || 1 + 1|| < || 1 + 0|| = 1 yet || 1 + 1|| 0; but also in any normed algebra 

in which ||ST|| < HSH ||7'|| holds, 1 < ||1||, since ||1|| = ||1 2 || ^ ||1|| 2 . 
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(24) X has a basis of two vectors, which can be taken to be 1 



Multiplication by 1 acts of course as the identity matrix; if 



13.19 (1) Answers (a) 0, (b) 1, (c) max(|a|, \b\), (d) ( a ^ 0) 



, then 


/a l\" _ /a" na n _ „ / 1 n/a\ 

\0a) =\0 a" ) ~ 0 \0 1 J 

Now n {“^j (i) = so 1 ^ II (o ”{ | < V 2 + n 2 /a 2 (Example 7.9(2)). 

Taking the nth root gives (2 + n 2 /a 2 )V 2 « -> 1, so p(T) = \a\. Note how, in this 
case, ||7’"|| first increases then decreases to 0. Only (c) has p(T) = ||r||. 

(3) Use the Cauchy inequality for \x + ay\ < 1 + \a\ 2 J\x\ 2 + \y\ 2 . 

(7) Let R and S be the radii of convergence of ^ n a n z n and b„z n ■ Then 
2 m ( a n + b n )z n = X« a nz" + ^ n b„z n has radius of convergence at least min(/L S). 

a„b n z' 1 has radius of convergence RS since lim inf \a n b n | -1 /" = liminf | a n \~ 1 ^" 

I bn\~ l/n . 

(8) f + g and fg have coefficients a n + b n , aob n + a\b n -\ + h a n bo. 


f o g(T) = a 0 + aig(T) + a 2 g(T) 2 H 

= (a 0 + aib 0 + a 2 bl H ) + (aybi + 2a 2 bi H )T 

+ (a\b 2 + a 2 b\ + • ■ • )T~ 


(9) || f(T) - Zn= 0 a n T n \\ = || Z“=iv + i “nT n \\ < ZZn+i M\\T\\ n -> 0 when 
||T|| < R. 

(14) cosO = e° = 1, but cos 2 = (cos 1 — sin l)(cos 1 + sin 1) < 0, so there is a 
number 0 < /l < 2, cos /j = 0. Since the conjugate of e ,e is e~ ,d , it follows that 
|e' e | = 1, so sin = 1; hence e'P = i and e 4 ^' = 1. 

(17) Expand e a ' s e aiT e a ' iT e a ^ T to second order, and equate with e s+T ss 1 + (S + 
T) + (S + T) 2 /2, to get q! 2 Q !3 = 1 / 2; the two values can be chosen to be equal. 
13.25 (2) f(t)g(t ) = 1 4=> f(t) = 1 /g(t) 0, Vr e [0, 1], g has a minimum 

distance to the origin Exercise 6.22(10), so / = \/g is also bounded. 

(4) ||T -1 1| = sup v || 7’ _1 x||/||x|| = sup y || J/H /|| T’j/H . 

( 8 ) 

e (t+s)T = e tT+sT = e tT e sT since (tT)(sT) = ( sT)(tT ). 
e {t+h)T = e ,T e hT = e tT (1 + hT + o(h)) 
so the derivative at t is e ,T T. 

(12) SR = 0 for S(ao, a\, ...) := («o, 0, . . .). But ||/?r|| = ||T|| for all T, so 
RT n -/> 0 when T n are unit elements. 
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13.31 (2) If |/(z)| < c\z\ m ' n < c\z\ k , then / is still a polynomial. 

(8) If a is a zero or pole of order ±/V, then (z — a) ±N f(z) is analytic and non-zero 
at a. Thus qf/p is bounded analytic on C, so must be constant. 

14.7 (2) f{t) — A is not invertible precisely when /(to) — A = 0 for some to e [0, 1]. 
(4) T 2 — z 2 = (T — z){T + z), so z 2 A e cr(T 2 ) =4- A = ±z € (t(T) (one 
of them). Conversely, if T 2 — z 2 has an inverse S, then S(T + z)(T — z) = 1 = 
(T — z)(T + z)S, so T — z is invertible. 

(7) ( S , T) — A(l, 1) = (S — A, T — A) is not invertible iff S — A or T — A is not 
invertible. 

(8) The map T © S — A : (x, y) i— >• (T x — Ax, Sy — A y) is invertible exactly when 
T — A and S — A are invertible. 

14.13 (1) Rx = Ax means a„ = Aa„+i, so a n = «q /A" ; but also 0 = A«o . There are 
no solutions to these algebraic equations. 

(3) l x is embedded in l l ( Z), so a{L) decreases from the first case to the second. 
In fact, in £ ] (Z), there are no eigenvalues, because ^“_ 0 0 |A|" cannot converge 
for any A. Yet the boundary of u(T) in l 1 , consisting of generalized eigenvalues, is 
preserved in i 1 (Z). 

(4) T T x = («o, 02 , « 3 , . . .) on i 1 . 

(5) T J x = («o, ai, C 13 / 2 , . . .) on £*. 

(9) The operator (T — A )/(x) = (x — A )/(x) is invertible only when A ^ [0, 1], 
There are no eigenvalues because x/(x) = A fix) for all x implies / = 0. The image 
of T — A is a subset of { g e C[0, 1] : g(A) = 0 }; as this set is closed and not C[0, 1], 
all A e [0, 1] are residual spectral values. 

(10) Induction on n: Expand VV n f as a double integral and change the order of 
integration. 

(12) 1 — | A | ^ ||Tx„ — Ax„ || — > 0; T— A = T (1— AT -1 ), so |A| < 1 A^ct(T). 
The boundary of cr (T) must be part of the circle. 

(13) T is 1-1 with a closed image 4=^ ||Tx|| ^ c||x||, so 

|| (T + H)x || > ||Tx|| - \\Hx\\ > (c - ||ff ||)||x|| 
shows T is an interior point of the set. 

14.21 (7) The eigenvalue equation for ML is a n+ \ = nXa n , so a n — n\ A"xo —*■ 00 . 
For RM,{ 0} = o p i(RM) T ) c o r iRM). 

14.29 (3) e a( - T ^ = oie T ) = <x ( 1 ) = { 1 }, so er(T) C 2 niZ. For an idempotent P, 
e l7lP = 1 + Pilni + &%£ + ■■■) = 1 + Pie 2ni - 1) = 1. 

14.40 (11) C' v is generated by <?,, where e,e j = 0 when i j, and c,e, = I . So 
a character satisfies Se,Se / = 0 and Se, = ±1. If 8 e\ = ±1, say, then Se ,• = 0 for 
i + 1. In fact 1 = 5(1) = X/ «(«f) = *(«i). 

(12) Bi C 2 ) is generated by ^ ^ ^Y and ^ A character S maps 

them to w\, ..., W 4 , which must satisfy w\ — 0, w 2 = 0, W 2 VJ 3 = w 1 , W 3 W 2 — um, 
for which there are no non-zero solutions. 

(15) x acts on the N points in A as (5,x) = (x,) = x. 
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(17) In the commutative Banach algebra y := { S,T }", the spectra remain the same, 
ay(A ) = er(A), so A(A) = ct(A) and the inclusions follow from A (5 + T) C 
A(S) + A (T) and A (ST) c A(S)A(T). 

15.3 (4) T is left- and right-invertible: TT*R = 1 = R'T*T. 

(7) Use Theorem 13.9; note that L*L = a e R, so a = X 2 . 

15.11 (5) What is meant is that if T <= X is normal, and 7 is a *-morphism, then 
J(T) e y is also normal, etc. 

(8) The inverse of T a is T- a , which is the adjoint: 

I g(x)T a f(x) dx= I g(x)/(x -a) Ax = j g(t + a) fit) dr = J T- a g{t) f (t) At . 

(20) ||(T*77 ! || 1/2 " = || (A* A)” + (B* B) n || 1/2 " < (||A|| 2 " + ||B|| 2n ) 1/2 " 
max(||A||,||fl||) 

(22) If T*T is idempotent, then <j(TT*) C a(T*T) U {0} c {0, 1). Hence 
a(TT*TT* - TT *) = {0}. 

15.15(1) (x, TT*x) = ||7’*x|| 2 = ||Tx|| 2 = (x, T*Tx) and use Example 10.7(3). 

(3) || T*x — T*x|| = || (T„ — T)*x|| = || T n x — Tx\\ -» 0. Conversely, take the limit 
of || T** || = || T n x || and use Exercise 1. 

(4) |A.| 2 ||x|| 2 = || A.jc || 2 = ||Ux|| 2 = ||x|| 2 . 

(5) Each distinct eigenvalue comes with an orthogonal eigenvector. In a separable 
space, there can only be a countable number of these. 

(6) {e m , T*e n ) = (Te m ,e n ) = k n S nm , so T*e„ = (e m , T*e n )e m = X n e n . Then 
show ||T*x|| = ||Tx||. 

(8) For (b), note that er(7 — T n ) = 1 -a{T) n c Bi[l],so||7- T n \\ = p(I-T n ) < 
2. For (c) use H = ker(T* — I) © ker(T* — I) 1 - = ker(T — I) © im (T — I). 
15.20 (1) Use Example 10.7(3). 

(3b) Let M, M 1 - be the domains of A and D. For any x = a + beM(B M x , 
(x,Tx) = (a + b,Ta + Tb) = (a, Ta) + ( b , Tb), 

(x, x) = (a + b, a + b) = ||a|| 2 + \\b\\ 2 . 

As ( a,Ta ) = ||a || 2 A with A. e W( A), and similarly ( b , Tb) = \\b\\ 2 fi, n e W(D), 
the values of (x, Tx)/||x|| 2 includes the line between X and /x. The collection of 
these lines is the convex hull of W (A) U W (D). 

(4b) For T ^ let x = then (x, Tx) = \a\ 2 a + aft + \P\ 2 a = a + a/3, 

because of the condition 1 = ||x|| 2 = |a| 2 + |/3| 2 . But a/3 — cos t sin t e‘ e takes the 
value of any complex number in the closed ball fii/2[0]. 

(1 lc) Let X := (T) x , so 0 = cr| = a T-x ~ II T — X\\ 2 . 
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15.25 ( 1 ) The singular values are (i) 4 with singular vectors proportional to 



and 1 with 


(_ 2 1 )-(_ 1 2 ) ;(ii )^ with 0>@* 


and 1 with 



(7) Let S := T/X where X is the largest eigenvalue (in the sense of magnitude); it 
has the same eigenvectors e„ as T except with eigenvalues /i„ := X n /X. If i'o = 
Z/I a n e n + V’ where y e ker ( r “ k ) then S kv 0 = Xn Hn a ne n + V- So 


\\S k vo-y\\ 2 * = Y J \Vn\ 2k \a n \ 2 ^c 2k \\vo\\ 2 , (0 < c < 1) 

and S k vo^ y as oo. Hence y/\\y\\, a ndv k+1 « 

the sequence does not converge unless X = |A.| but behaves like e lke y/\\y\\. 

15.34 (6) Answers: (b) eigenvalues 1 /(« + \)n, eigenvectors sin(n + j )nx ; (c) 
l/(n + ^) 2 7r 2 , sin(n + j )nx ; so 


z 


1 

(n + j ) 4 7 t 4 


t l 


min(.x, t/)“ dj/dx = 1/6. 


o o 


15.39 (3) If 0A = 0 for all 0 e 5 and A is self-adjoint, then a (A) c 5(A) = { 0 } 
and A = 0. 

(7) a(T) C Bi[0], and a(T)~ l = a(T~ x ) c Bi[0]. 

(8) By the spectral mapping theorem { 0 } = cr(T’ 2 - f) = {il 2 -A:Ae o(P ) }, 
so A. = 0, 1. 

15.44 (4) Let T = A + iB with A, B self-adjoint. Then A ^ 0 implies S(T) c 
S(A) + iS(B) c K+ + /R. Conversely, A = (T + T*)/2, so 0A = (0T + 0T)/ 2 = 
Re <pT > 0 for 0 e 5. 

15.49 (4) |T*| 2 = TITIL* = rt/*£/|7’|I/* = (TU*) 2 . 

(10) |T| is invertible, so let U := r|r| _1 ; it is unitary, e.g. UU* = t\T\~ 2 T* 

TT -\ = i 

(11) (b) T = U\T\ = S|T|z, where := U\T\l e US , so tr(T) = tr(5|T|z) is 
independent of the basis. 


(c) | tr(5T)| = | tr(S£/|7’|)| = |<f/|T|z, S*m 2 >wsl 


< liuirpllwsll^ir^llws < lis* 


\T\n ns = I|S|| ||T| 


Tr 


I 2 

(d) The norm axioms are satisfied because II T ll Tr — III 7 ’! 2 Whs and 


tr |S + T\ = tr U*(S + T) = (U , S) ns + (U , T) ns < ||S|| W5 + ||71 W5 . 


Also, 
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lir*|| Tr = trc/r* = ut*u = tr|r|. 

(e) |T| = U*T = CB where C := U* A e US. So tr|T| = (C*,B) < 

IIA|| W 5 ||B|| W5 . 

(f) If e n . e' n are the singular vectors of T, then T\ 2 e„ = \'a„ \ 2 e n . Take the polar 
decomposition of {e' n , T e n ) = e ,e "\{e' n , Te n ) |, and let Ue n := e ,e "e' n . Then 

Z l< e "’ Te ^\ = Z U * Te »> = < II T II Tr 


If Te n = Ke' n , then ||T|| Tr = (4, T e „) = 


Glossary of Symbols 


-* Converges to 
— *■ Weak convergence 
|| • ||x Norm of space A 
{-, •) x Inner product of space X 
1 e Characteristic function on E 
A series of terms 

[a n \ Equivalence class of sequence (a n ) 

T* Hilbert adjoint of an operator T, or the involute of an algebra element 
T J Adjoint of an operator T 
x 1 Dual of a sequence x 
T Gelfand/Fourier transform of T 
HAJ Span of vectors in A 
A c Complement of set A 
A! Commutant algebra of A 
A° Interior of set A 

A 1 - Annihilator or orthogonal complement of A 
■*- A Pre-annihilator of A 
X* Dual space of A, or set of conjugates of X 
3 A Boundary of set A 
A Closure of set A 
xy Multiplication of sequences 
x ■ y Dot product of sequences 
x * y Convolution of sequences or functions 
A + B Addition of sets 
A © B Direct sum of subspaces 
X = Y Isomorphic spaces 
X = Y Isometric spaces 
X C Y A is embedded in Y 
A/M Quotient space of A by M 
B( A) Space B(X, A) 

B( X, Y) Space of continuous linear operators A -* Y 
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Glossary of Symbols 


B r (a) 
B r [a] 
Bx 
c 
CO 
C(X) 
Ch(X, Y) 
C"(K, X) 
C M (A ) 
C[x, y] 
codim A 
d 
D 

D(U, Y) 
D\ 

Doo 

A 

S x 

dim X 

F 

S(X) 

1 

im T 
index (T) 

J 
ker T 
L 
l p 
L p (A ) 
lim„^oo 
M 
M a 
S{X) 
R 

P(T) 

cr(T) 

T a 
tr (T) 
W(T ) 


Ball of radius r, center a 

Closed ball 

Unit open ball of X 

Space of convergent sequences 

Space of sequences that converge to zero 

Space Cb(X, C) 

Space of bounded continuous functions / : X —> Y 

Space of n -times continuously differentiable functions 

Space of analytic functions on A 

Space of polynomials in x , y 

Codimension of subspace A 

Distance function 

Differentiation operator 

Set of differentiable functions 

“Taxicab” distance onXxf 

Max distance on X x Y 

Character space of X 

Dirac functional 

Dimension of space X 

A field, usually M or C 

Group of invertibles of X 

Identity operator 

Image of a linear map T 

Index of a Fredholm operator T 

Radical of an algebra 

Kernel or null space of a linear map T 

Left-shift operator 

Space of sequences with the p-norm 

Space of functions on A with the p-norm 

Limit as n — ► oo 

Lebesgue measure on 

Multiplication operator by a 

State space of an algebra 

Right-shift operator 

Spectral radius of T 

Spectrum of T 

Translation by a 

Trace of T 

Numerical range of T 


Further Reading 


Functional analysis impinges upon a wide range of mathematical branches, from 
linear algebra to differential equations, probability, number theory, and optimization, 
to name just a few, as well as such varied applications as financial investment/risk 
theory, bioinformatics, control engineering, quantum physics, etc. 

As an example of how functional analysis techniques can be used to simplify 
classical theorems consider Picard’s theorem for ordinary differential equations. The 
differential equation y' = F(x , y), y(a) = y a , is equivalent to the integral equation 
y(x) = T ( y ) := y a + f* F(s, y(s)) ds. It is not hard to show that if F is Lipschitz 
in y and continuous in x, then T is a contraction map on C[a — h, a + h] for some 
h > 0, and the Banach fixed point theorem then implies that the equation has a 
unique solution locally. 

However, the classical derivative operator is in many ways inadequate: its domain 
is not complete and it is unbounded on several norms of interest. But there is a way 
to extend differentiation to much larger spaces, namely Sobolev spaces and Dis- 
tributions. The former are Banach spaces L p of functions that have certain grades 
of integrability ( p ) and differentiability (s), while the latter are spaces of function- 
als that act on them with weak*-convergence. Distributions include all the familiar 
functions in Lj oc , but also other ‘singular’ ones, such as Dirac’s delta ‘function’ S 
and 1 /x" . Differentiation can be extended as a continuous operator on these spaces, 
e.g. L p — > L | . Moreover, distributions can be differentiated infinitely many times; 
for example, the derivative of the discontinuous Heaviside function 1 jj+ is S. But, 
in general, ‘singular’ distributions cannot be multiplied together. A central result 
is the Sobolev inequality, || m || (JR«; < Cn, P \\Du\\ L p(pn), for n > 2, | = j - 
which implies that the identity map L p s (R") — > L p (R"), along the arrows in Fig. 1, 
is continuous. The study of operators on such generalized spaces is of fundamen- 
tal importance: from extensions of the convolution and the Fourier transform, to 
pseudo-differential operators of the type /(x, D), singular integrals, and various 
other transforms (see [12, 26, 28]). 

Although unbounded, classical differential operators are normal ‘closed opera- 
tors’: these have a graph { (x, T x) : x e X } which is closed in X x X. Quite 
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Fig. 1 Sobolev Spaces 


a lot of the spectral theory extends in modified form to them. For example their 
spectrum remains closed but not necessarily bounded. So, if one inverts in a point 
X £ a (T) then (T — A ) -1 becomes a regular continuous operator, which can often be 
expressed as an integral operator, whose kernel is called its Green ’s function. Indeed, 
it turns out that ‘elliptic’ differential operators become Fredholm self-adjoint opera- 
tors under this inversion. This immediately gives certain results, usually falling under 
the heading of Sturm-Liouville theory, such as that the spectrum of the Laplace oper- 
ator —A on a compact shape in is an unbounded sequence of isolated positive 
eigenvalues, called the “resonant frequencies” or “harmonics” of the shape. Deeper 
results include the Atiyah-Singer index theorem: the Fredholm index of an elliptic 
differential operator is equal to a certain topological invariant of the domain. 

The concept of a Banach space can be generalized to a topological vector space, 
namely a vector space with a topology that makes its operations continuous. Many 
theorems continue to hold at least for “locally convex topological vector spaces”, 
including the Hahn-Banach theorem, the open mapping theorem, and the uniform 
boundedness theorem. Other important results are Schauder’s fixed point theorem, the 
Krein-Milman theorem, the analytic Fredholm index theorem, and the Hille-Yosida 
theorem. 

Harmonic analysis is the study of general (but usually locally compact) group 
algebras, especially the Fourier transform. The central results are the Pontryagin 
duality theorem, which asserts that the character space of L 1 (G) is itself a group that 
is ‘dual’ to G, and the Peter- Weyl theorem, von Neumann algebras are *-algebras that 
arise as double commutators of C*-algebras. Equivalently, they are the weakly closed 
subspaces of B(H). The spectral theorem holds for them. There is a lot of theory 
devoted to their structure, and a complete classification is still an open problem. 

One must also include some outstanding conjectures: whether every operator on 
a separable Hilbert space has a non-trivial closed invariant subspace; whether every 
infinite-dimensional Banach space admits a quotient which is infinite-dimensional 
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and separable; Selberg’s conjecture about the first eigenvalue of a specific Laplace- 
Beltrami operator on Maass waveforms; the Hilbert-Polya conjecture that the non- 
trivial zeros of the Riemann zeta function are the eigenvalues of some unbounded 
operator ^ + i A with A self-adjoint; etc. 
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Index 


A 

Absolute convergence, 108, 109, 203, 289 
Adjoint 

operator, 240, 244, 346 
space, see dual space 
Adjoint operator, 188 
Algebra 

division, 310 
simple, 367 

Analytic function, 267 , 300, 325 
Annihilator, 237 

Approximate eigenvalue, 314, 355 
Approximation of identity, 163 
Archimedean property, 10, 30 
Arzela-Ascoli theorem, 83 
Ascending sequence of eigenspaces, 318 
Auto-correlation, 168 
Automorphism, 282, 284, 298, 347 
Axiom of choice, 10, 31 


B 

Baire, 47 

category theorem, 46, 221, 247 
Ball, 10, 16, 66, 99, 136, 178 
closed, 23, 25 1 
Banach, 106 
algebra, 277 

commutative, 281, 311, 339, 348 
morphism, 282 
semi-simple, 331 
fixed point theorem, 5 1 
space, 105, 122, 221 
Banach- Alaoglu theorem, 251 
Banach-Mazur theorem, 225 
Banach-Steinhaus’s theorem, 246 
Basis, 91 


dual, 187, 225, 226, 341 
Hamel, 209 

orthonormal, 201, 355, 361 
Schauder, 110, 118 
Bertrand’s convergence test, 112 
Bessel 

functions, 208 
inequality, 203 
Binomial theorem, 303 
Bolzano- Weierstrass property, 76 
Boundary, 17, 58, 100, 297 
Bounded map, see operator, continuous 
Bounded set, 65, 101, 104 
totally, 67, 104, 136 
Bounded variation, 164 


C 

C*-algebra, 345 
commutative, 375 
Cantor 

nested set theorem, 77 
set, 24, 25, 77 
Cauchy, 27 1 

convergence test, 1 1 1 
inequality, 98, 173 
integral formula, 272 
residue theorem, 27 1 
sequence, 38, 49, 67, 68 
theorem, 269 

Cauchy-Hadamard’s theorem, 289 
Cauchy-Riemann equations, 268 
Cauchy-Schwarz inequality, 173, 357 
Cayley transfonnation, 352 
Cayley-Hamilton theorem, 325 
Center of an algebra, 281, 347 
Centralizer, 286 
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Index 


Cesaro sum, 1 1 3 
Chain, 92 

Character, 282, 334 
Characteristic 
function, 35 
polynomial, 307 
Chebyshev polynomials, 208 
Closed graph theorem, 223 
Closed range theorem, 242 
Closed set, 22, 29, 45, 67, 71 
Closure, 23, 69, 102, 286 
Codimension, 133 
Coercive operator, 360 
Commutant, 286 
Commutative 

Banach algebra, 281, 311, 339, 348 
C* -algebra, 377 
Commutator, 282, 338 
Compact 

operator, 226, 321, 346, 369, 383 
set, 70, 74, 104, 137, 309 
Complementary subspace, 180, 224 
Complete metric space, 37, 40 
Completeness of 
c, 141 
C w , 135 
C(X, Y ), 78 
L\ 159 
144 
1 2 , 147 
L°°, 157 
l°°, 140 
IP, 153 

Completion, 43, 49, 107, 174, 238 
Complex numbers, 14, 48, 346 
Complexification, 217 
Component, 63, 297 
Condition number, 130 
Conformal, 199 
Conjugate 

gradient algorithm, 2 1 8 
space, see dual space 
Connected set, 57, 105 
component, 63 
path-connected, 64, 104, 296 
Continuity, 3 1 

of inner product, 173 
of norm, 101 
Continuous 

functions, 78, 106, 278, 346 
spectrum, 313 
Contraction map, 50, 51 
Convergence, 27, 31 


absolute, 108, 109, 203, 289 

in norm, 246 

linear, 29 

pointwise, 78, 246 

quadratic, 29 

strong, 246 

uniform, 80 

weak, 246, 248 

weak*, 248 

Convex set, 91, 99, 179, 357 
Convolution, 121, 146, 164, 278, 341 
Coset, 131, 183 
Cross-correlation, 168 
Curve, 259 


D 

Deconvolution, 196 
Delta function, 121 
Dense set, 25, 33, 53, 162, 182 
Descending sequence of spaces, 318 
Determinant, 296 
Diameter of a set, 65 
Differentiation, 122, 257, 278, 291 
product rule, 286 
Dimension, 91 
Direct sum, 9 1 
Dirichlet kernel, 248 
Disconnected set, 57 
totally, 57 

Discrete metric space, 15, 22, 68, 76 
Distance, 13, 33 
between sets, 24 
inherited, 15 
Division algebra, 310 
Divisor of zero, 279, 313, 347 
topological, 297, 313, 347 
Dot product, see inner product 
Dual 

basis, 187, 225, 226, 341 
operator, see adjoint operator 
space, 115, 122, 185, 231 
double dual, 238, 245 


E 

Eigenspace, 313 
Eigenvalue, 313, 322, 329, 354 
approximate, 314, 355 
Eigenvector, 313, 318, 354 
Elliptic operator, 360 
Embedding, 128, 238, 280, 283, 389 
Equicontinuous function, 83 
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Ergodic Theorem, 355 
Euclidean distance, 14 
Euclidean space, 14, 48, 69, 94, 135, 172, 
174, 249, 278, 287, 341,346 
Exponential function, 342 
Extension of a 

functional, see Hahn-Banach theorem 
uniformly continuous function, 49 
Exterior, 17 
point, 17 

F 

Factor space, see quotient space 
Fibonacci sequence, 168 
Filter, 165 

Finite-dimensional vector space, 91, 136 
Fixed point, 5 1 
Fourier, 206 

series, 165, 205 

transform, 121, 168, 242, 342, 378 
Frechet, 14 
Fredholm, 320 
alternative, 320 
operator, 229 

Frequency-time orthonormal bases, 211 
Function 

composition of, 33, 49 
continuous, 60, 66, 72 
contraction, 50, 51 
equicontinuous, 83 
equivalence or bi-Lipschitz, 50 
inverse of, 35 
Fipschitz, 50, 67, 85, 116 
open, 35 

spaces, 96, 154, 172, 205, 335 
Functional, 115 
Functional calculus, 325, 375 
Fundamental sequence, see Cauchy seq. 
Fundamental theorem of 
algebra, 310 
calculus, 262 


G 

Gauss’s convergence test, 1 12 
Gauss-Seidel algorithm, 127 
Gaussian quadrature, 215 
Gelfand, 337 

transform, 337, 376 
Gelfand-Mazur theorem, 310 
Gelfand-Naimark theorem, 389 
Generalized nilpotent, see quasinilpotent 
Generating functions, 341 


Gershgorin’s theorem, 315 
Gram matrix, 182 

Gram-Schmidt orthogonalization, 20 1 
Green’s function, 214 
Group algebra, 286, 287, 346 

H 

Haar basis, 212 
Hadamard matrices, 219, 256 
Hahn-Banach theorem, 232, 236 
Hamel basis, 209 
Hamming distance, 1 5 
Hausdorff, 28 

maximally principle, 92, 202, 234, 281 
Hausdorff-Toeplitz theorem, 357 
Heine-Borel theorem, 74 
Hermite functions, 207 
Hermitian operator, see self-adjoint operator 
Hilbert, 174 
cube, 148 

space, 174, 369, 389 
Hilbert-Schmidt operator, 368 
Holder’s inequality, 151, 161 
Holomorphic function, see analytic function 
Homeomorphism, 34, 64 
Homomorphism, see morphism 


I 

Ideal, 280, 333, 366, 369, 391 
Idempotent, 285, 330, 353 
Ill-conditioned equation, 130 
Image 

of an operator, 116 
reconstruction, 197 
Imaginary part, 349 
Index 

of a Fredholm operator, 229 
theorem, 229 
Inner product, 171, 369 
Integers, 14, 22, 57 
Integration, 121, 268 
by parts, 265 
change of variable, 265 
Interior, 17 

Intermediate value theorem, 60 

Interval, 59 

Inverse problem, 191 

Invertible elements, group of, 295 

Involution, 345 

Isolated point, 20 

Isometry, 50, 186, 210, 318, 376, 382, 384 
partial, 385 
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Isomorphism, 34, 128, 189, 222, 282, 340, 
377 


J 

Jacobi algorithm, 127 
Jacobson radical, 331, 339 
Jordan canonical form, 323 
JPEG, 217 


K 

Kernel, 282, 338 
Dirichlet, 248 
of an operator, 1 16 
Kummer’s convergence test, 112 
Kuratowski, 58 


L 

L'Hopital’s rule, 261 

Laguerre functions, 207 

Laplace transform, 342 

Laurent series, 30 1 

Least squares approximation, 182 

Lebesgue, 160 

Legendre polynomials, 206 

Leibniz test, 109 

Liminf, 3 1 

Limit, 27 

Limit point, 21 

Limsup, 31 

Linear transformation, 1 15, 258 
Linearly independent vectors, 90, 3 1 3 
Liouville’s theorem, 301 
Lipschitz function, 50, 67, 85, 116 
Locally connected, 64 
Logarithm function, 328 


M 

Map, see function 

Matrix, 117, 124, 172, 190, 283, 285, 307, 
341, 366, 372 

Maximum modulus principle, 272 
Mean value theorem, 263 
Metric, see distance 
Metric space, 13 

completion of, 43, 49 
discrete, 68, 76 
equivalent, 50, 67 
isometric, 50 
separable, 53, 68, 85 
Minimal polynomial, 325, 329 


Minkowski, 151 

inequality, 151, 164 
semi-norm, 101 
Mobius transformation, 352 
Morphism, 282, 338, 345, 382 
Multiplication operator, 143, 199, 245, 315, 
325, 352, 355 


N 

Natural numbers, 14 
Neighborhood, 17 
Newton-Raphson algorithm, 265 
Nilpotent, 285, 330, 331, 347 
generalized, see quasinilpotent 
Norm, 94 

completion, 107, 174, 238 
equivalent, 97 
Normal element, 348, 353 
Nowhere dense set, 25, 46 
Nullity, 126 
Numerical range, 356 

O 

Open 

ball, see ball 
cover, 70 

mapping theorem, 221 
set, 16 
Operator 

compact, 226, 321, 346, 369, 383 
continuous, 115, 258 
integral, 119, 127, 228 
multiplication, 119, 143, 199, 245, 315, 
325 

multiplier, 352, 355 

shift, 118, 129, 146, 190, 231, 242, 246, 
299, 314, 317, 324, 330, 352, 359 
trace-class, 373, 391 
Orthogonal, 172 

Orthonormal basis, 201, 209, 355, 361 


P 

Parallelogram law, 175 
Parseval’s identity, 202 
Partial isometry, 385 
Path, 61 

Path-connected set, 64, 104, 296 
Perpendicular, see orthogonal 
Point spectrum, 313 
Pointwise convergence, 78, 246 
Polar decomposition, 384 
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Polarization identity, 175, 346 
Polynomial, 35, 80, 105, 137, 183, 310 
Positive functional, 384 
Power series, 288, 347 
Power spectrum, 167, 168 
Pre-annihilator, 237 
Product space 

Banach algebras, 279 
Banach spaces, 106 
C*-algebras, 346 
metric spaces, 15 
normed spaces, 97 
vector spaces, 90 

Projection, 130, 224, 231, 330, 355, 363, 383 
orthogonal, 180 
Pseudo-distance, 26 
Pythagoras’ theorem, 172, 202 


Q 

QR decomposition, 218 
Quadratic form, 357 
Quasinilpotent, 331, 339, 347, 351 
Quotient space, 131 


R 

Raabe’s convergence test, 112 
Radical, 331, 339 
Radius of convergence, 289 
Rank, 126, 227, 366 
Ratio convergence test, 1 1 1 
Rational numbers, 14, 40, 46, 61 
Rayleigh coefficient, 356 
Real numbers, 14, 22, 59 
construction of, 40 
Real part, 349 
Reflexive space, 238 
Regression, 193 
Residual spectrum, 313, 354 
Resolvent set, 307 
Riesz, 187 
lemma, 321 
map, 186 
theorem, 322 
Rodrigues’ formula, 208 
Root convergence test, 1 1 0 
Rouche’s theorem, 273 

S 

Scalar multiplication, 89, 101 
Schauder, 110 
basis, 110, 118 


Schrodinger equation, 342 
Schur’s test, 124 
Schur’s theorem, 255 
Schwarz inequality, 173 
Self-adjoint element, 348 
positive, 378 

Semi-metric, see pseudo-distance 
Semi-simple Banach algebra, 331 
Separable, 53, 68, 85, 202 
Separating hyperplane theorem, 235 
Sequence 

asymptotic, 38 
Cauchy, 38, 49, 67, 68 
convergent, 27 
divergent, 27 
increasing, 39 
rearrangement, 29 

Sequence spaces, 95, 139, 172, 178, 204, 
278, 335, 341, 346 
Series, 108 

convergence tests, 110 
power, 288 
rearrangement, 109 

Shift operator, 129, 146, 190, 231, 242, 246, 
299, 314, 317, 324, 330, 352, 359 
Simple algebra, 367 
Singular value decomposition, 362 
Spanning vectors, 90 
Spectral 

mapping theorem, 328 
radius, 288, 309 

theorem for compact normal operators, 
361 

Spectrum, 282, 307, 322 
continuous, 313 
normal element, 377 
of an algebra, 334 
of an operator, 312 
point, 313 
residual, 313, 354 
Sphere, 36, 100 
Spherical harmonics, 208 
Spline, 80, 162 
Square root operator, 380 
Standard deviation, 357 
State space, 334, 378 
Stone, 83 

Stone- Weierstrass theorem, 80, 83 
Strong convergence, 246 
Subalgebra, 280 
Subsequence, 29, 38 
Sylvester’s inequality, 126 
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T 

Tangent, 259 
Taylor 

series, 300 
theorem, 264 

Tikhonov regularization, 194, 365 

Tomography, 197 

Topological 

divisor of zero, see divisor of zero, topo- 
logical 

nilpotent, see quasinilpotent 
space, 26 

Total set, see orthonormal basis 
Totally bounded set, 67, 104, 136, 226 
Totally disconnected, 64 
Trace, 368 

Trace-class operator, 373, 391 
Translation, 352 
Transpose, see adjoint operator 
Triangle inequality, 13 
Trotter formula, 294 

U 

Uncertainty principle, 357 
Uniform 

bounded theorem, 246 
convergence, 80 

Uniformly continuous function, 48, 68, 73 
Unitary 

element, 348, 349 
operator, 189 


V 

Vector addition, 101 
Vector space, 89 
addition, 89 
linear subspace, 90 
Volterra, 122 

operator, 121, 199, 318, 340, 373 
Von Neumann, 390 


W 

Walsh basis, 219 
Wavelet bases, 212 
Weak convergence, 246, 248 
Weakly 

bounded set, 250 
closed set, 253 
Weierstrass, 83 
M-test, 113 
Weight, 113, 172 
Well-ordering principle, 92 
Wiener deconvolution, 196 
Wiener’s theorem, 341 
Wiener- Khinchin theorem, 168 
Windowed Fourier bases, 211 


Z 

Zorn’s lemma, 92 


